wypoon opened a new issue #2169:
URL: https://github.com/apache/iceberg/issues/2169
Impala has been adding support for Iceberg in the Impala project.
We have been doing some interop testing of this support in progress.
We have found that an Iceberg table written by Impala fails to be read by
Spark.
On investigation, we found that in the manifest written by Impala, the file
path has an extraneous '/':
```
"data_file":{
"file_path":"hdfs://nn:8020/hive_warehouse/iceberg_table/data//ad4ebfa17d4a94ed-9bac8cde00000000_1968527942_data.0.parq",
"file_format":"PARQUET",
"partition":{},
...
}
```
Note the extraneous '/' before the filename.
This causes the `InputFile` to not be found in
`org.apache.iceberg.spark.source.RowDataReader#open(FileScanTask)`, which calls
`BaseDataReader#getInputFile(FileScanTask)`, which looks up the path of the
file to scan (which is what is in the manifest) in a `Map<String, InputFile>`,
as the key in the map is the normalized path.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]