[GitHub] [iceberg] wypoon opened a new issue #2169: Lookup of data file by path string is not robust to non-normalized paths

GitBox Wed, 27 Jan 2021 19:05:02 -0800


wypoon opened a new issue #2169:
URL: https://github.com/apache/iceberg/issues/2169



   Impala has been adding support for Iceberg in the Impala project.
   We have been doing some interop testing of this support in progress.
   We have found that an Iceberg table written by Impala fails to be read by 
Spark.
   On investigation, we found that in the manifest written by Impala, the file 
path has an extraneous '/':
   ```
     "data_file":{
       
"file_path":"hdfs://nn:8020/hive_warehouse/iceberg_table/data//ad4ebfa17d4a94ed-9bac8cde00000000_1968527942_data.0.parq",
       "file_format":"PARQUET",
       "partition":{},
       ...
     }
   ```
   Note the extraneous '/' before the filename.
   This causes the `InputFile` to not be found in 
`org.apache.iceberg.spark.source.RowDataReader#open(FileScanTask)`, which calls 
`BaseDataReader#getInputFile(FileScanTask)`, which looks up the path of the 
file to scan (which is what is in the manifest) in a `Map<String, InputFile>`, 
as the key in the map is the normalized path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] wypoon opened a new issue #2169: Lookup of data file by path string is not robust to non-normalized paths

Reply via email to