RussellSpitzer commented on pull request #2494:
URL: https://github.com/apache/iceberg/pull/2494#issuecomment-823221895
Yeah, because of classloader issues I was only able two write tests for
parquet. Both orc and Avro are set to ignore in the procedure/action tests
Sent from my iPhone
> On Apr 20, 2021, at 6:48 AM, László Pintér ***@***.***> wrote:
>
>
> @lcspinter commented on this pull request.
>
> In data/src/main/java/org/apache/iceberg/data/DataUtil.java:
>
> > + } else if (format.contains("orc")) {
> + return listOrcPartition(partitionKeys, uri, spec, conf,
metricsConfig, mapping);
> + } else {
> + throw new UnsupportedOperationException("Unknown partition format:
" + format);
> + }
> + }
> +
> + private static List<DataFile> listAvroPartition(Map<String, String>
partitionPath, String partitionUri,
> + PartitionSpec spec,
Configuration conf) {
> + try {
> + Path partition = new Path(partitionUri);
> + FileSystem fs = partition.getFileSystem(conf);
> + return Arrays.stream(fs.listStatus(partition, HIDDEN_PATH_FILTER))
> + .filter(FileStatus::isFile)
> + .map(stat -> {
> + // Avro file statistics cannot be calculated without
reading the file.
> @aokolnychyi I changed back to the original value (-1L) and I will open a
separate PR to address this issue.
> With -1 set as rowCount will result in
[failure|https://github.com/apache/iceberg/blob/a79de571860a290f6e96ac562d616c9c6be2071e/core/src/main/java/org/apache/iceberg/DataFiles.java#L288]
when calling DataFiles.Builder.build(). I checked the associated test suite
TestSparkTableUtil, and it seems that the import of spark tables having data
files in avro file format [was not
tested|https://github.com/apache/iceberg/blob/a79de571860a290f6e96ac562d616c9c6be2071e/spark2/src/test/java/org/apache/iceberg/spark/source/TestSparkTableUtil.java#L139]
at all.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or unsubscribe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]