[ 
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297218#comment-17297218
 ] 

Gabor Szadovszky commented on PARQUET-1992:
-------------------------------------------

[~mayaa], 

bq. Benefit - the regular dev flow of building and running unit tests won't 
require downloading files and connectivity to github bq.
We already need to download a bunch of file from the internet (maven plugins 
and dependencies). So even the tarball does require downloading if we want to 
build/test.

bq. If so, they could be run by maven-failsafe-plugin as part of the 
integration-test/verify phase and missing the interop files would not fail "mvn 
install" but only "mvn verify" bq.
AFAIK the failsafe plugin is configured to be executed at {{mvn verify}} and as 
{{install}} depends on the phase {{verify}} it still would fail if the 
integration tests could not be executed. BTW, we already have an integration 
test: 
[FileEncodingsIT|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/encodings/FileEncodingsIT.java].

bq. 2. Should the files for interop tests be downloaded directly in the test or 
using submodules in a separate maven profile for integration-test or as part of 
an existing profile, e.g. ci-test? bq.
I think there is another option by downloading the required files directly from 
maven. I am not sure which plugin is capable of this or if it is better than 
downloading from the test by java code but it is still an option.

bq. Git submodules provides flows for handling downloaded file versions - 
specific to a commit or a branch. bq.
A github download link can contain the hash of the changeset so capable of 
handling file versions.
bq. Git submodules manages downloading files only when needed bq.
This is not true in the current situation. We are invoking the {{git submodule 
update}} in the {{initialization}} phase of maven. So we are downloading the 
whole {{parquet-testing}} repo (of a specific changeset) at least once.
bq. It is aligned with the integration tests in parquet-cpp (arrow) bq.
How parquet-cpp solves the similar issue with the tarball?
bq. The files can be used for additional interop tests of other features bq.
I agree, this was my first idea I liked in git submodules. Meanwhile, I've 
started thinking about implementing interoperability tests and now I think such 
tests could be implemented in the {{parquet-testing}} repo as they do not 
require low level access to the {{parquet-mr}} classes like unit tests do. My 
fear about the git submodules is that the {{parquet-testing}} repo might grow 
big and AFAIK you cannot control which files/directory you would like to sync 
only the changeset.

bq. The tarball still won't contain the interop files, so the integration tests 
will fail on it. bq.
I think we should not add the parquet files into the source tarball in any way. 

bq. Anyway, both ways are acceptable, so I'll implement whatever sounds best to 
the community. bq.
I currently agree with [~sha...@uber.com] about downloading the required files. 
Meanwhile I am curious about the parquet-cpp solution.

bq. BTW, when investigating the profiles, it seems to me that there is an old 
reference to the "travis" maven profile mentioned in the .travis.yml file, 
though its new name is "ci-test".  bq.
That's a good catch! We'll fix it.

> Cannot build from tarball because of git submodules
> ---------------------------------------------------
>
>                 Key: PARQUET-1992
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1992
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Gabor Szadovszky
>            Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean 
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and 
> somehow avoid the git submodule update in case of executed in a non-git 
> envrionment
> * Make the downloading of the parquet files and the related tests optional so 
> it won't fail the build from the tarball



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to