[
https://issues.apache.org/jira/browse/PARQUET-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297218#comment-17297218
]
Gabor Szadovszky commented on PARQUET-1992:
-------------------------------------------
[~mayaa],
bq. Benefit - the regular dev flow of building and running unit tests won't
require downloading files and connectivity to github bq.
We already need to download a bunch of file from the internet (maven plugins
and dependencies). So even the tarball does require downloading if we want to
build/test.
bq. If so, they could be run by maven-failsafe-plugin as part of the
integration-test/verify phase and missing the interop files would not fail "mvn
install" but only "mvn verify" bq.
AFAIK the failsafe plugin is configured to be executed at {{mvn verify}} and as
{{install}} depends on the phase {{verify}} it still would fail if the
integration tests could not be executed. BTW, we already have an integration
test:
[FileEncodingsIT|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/encodings/FileEncodingsIT.java].
bq. 2. Should the files for interop tests be downloaded directly in the test or
using submodules in a separate maven profile for integration-test or as part of
an existing profile, e.g. ci-test? bq.
I think there is another option by downloading the required files directly from
maven. I am not sure which plugin is capable of this or if it is better than
downloading from the test by java code but it is still an option.
bq. Git submodules provides flows for handling downloaded file versions -
specific to a commit or a branch. bq.
A github download link can contain the hash of the changeset so capable of
handling file versions.
bq. Git submodules manages downloading files only when needed bq.
This is not true in the current situation. We are invoking the {{git submodule
update}} in the {{initialization}} phase of maven. So we are downloading the
whole {{parquet-testing}} repo (of a specific changeset) at least once.
bq. It is aligned with the integration tests in parquet-cpp (arrow) bq.
How parquet-cpp solves the similar issue with the tarball?
bq. The files can be used for additional interop tests of other features bq.
I agree, this was my first idea I liked in git submodules. Meanwhile, I've
started thinking about implementing interoperability tests and now I think such
tests could be implemented in the {{parquet-testing}} repo as they do not
require low level access to the {{parquet-mr}} classes like unit tests do. My
fear about the git submodules is that the {{parquet-testing}} repo might grow
big and AFAIK you cannot control which files/directory you would like to sync
only the changeset.
bq. The tarball still won't contain the interop files, so the integration tests
will fail on it. bq.
I think we should not add the parquet files into the source tarball in any way.
bq. Anyway, both ways are acceptable, so I'll implement whatever sounds best to
the community. bq.
I currently agree with [[email protected]] about downloading the required files.
Meanwhile I am curious about the parquet-cpp solution.
bq. BTW, when investigating the profiles, it seems to me that there is an old
reference to the "travis" maven profile mentioned in the .travis.yml file,
though its new name is "ci-test". bq.
That's a good catch! We'll fix it.
> Cannot build from tarball because of git submodules
> ---------------------------------------------------
>
> Key: PARQUET-1992
> URL: https://issues.apache.org/jira/browse/PARQUET-1992
> Project: Parquet
> Issue Type: Bug
> Reporter: Gabor Szadovszky
> Priority: Blocker
>
> Because we use git submodules (to get test parquet files) a simple "mvn clean
> install" fails from the unpacked tarball due to "not a git repository".
> I think we would have 2 options to solve this situation:
> * Include all the required files (even only for testing) in the tarball and
> somehow avoid the git submodule update in case of executed in a non-git
> envrionment
> * Make the downloading of the parquet files and the related tests optional so
> it won't fail the build from the tarball
--
This message was sent by Atlassian Jira
(v8.3.4#803005)