[GitHub] [iceberg] RussellSpitzer commented on issue #2917: Can Spark iceberg connector read from specific file?

GitBox Mon, 02 Aug 2021 08:33:32 -0700


RussellSpitzer commented on issue #2917:
URL: https://github.com/apache/iceberg/issues/2917#issuecomment-891122748



   Reading specific files isn't really supported at the moment although you may 
be able to repurpose the Rewrite code to handle it... To do so you would have 
to make BaseFileScanTasks which you could do by converting the File metadata 
rows into DataFiles (See SparkDataFile) and then putting those into the manager 
....
   
   I think one of the key issues is that the way we read the files is also 
state dependent on the table at that time so we have to know what spec we are 
using, for the residual evaluator you can just put in an always true evaluator.
   
   ---
   
   If you just want to read  files it may make sense to just pass the file 
paths to the normal spark reading code rather than doing it through the Iceberg 
Datasource. Something like 
   
   
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L595
   spark.read.parquet(files:_*)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on issue #2917: Can Spark iceberg connector read from specific file?

Reply via email to