[GitHub] [iceberg] GrigorievNick opened a new issue #2917: Can Spark iceberg connector read from specific file?

GitBox Mon, 02 Aug 2021 08:07:06 -0700


GrigorievNick opened a new issue #2917:
URL: https://github.com/apache/iceberg/issues/2917



   I wanna read few files from the previous version of the table.
   I can find files and their metadata using `.files`. prefix in the datable 
name.
   ```
   val files = sparkSession
         .read
         .option("snapshot-id", previousVersion)
         .format("iceberg")
         .load(s"$testTable.files")
         .filter(my filters to find file)
   ```
   I also see that in Spark3RewriteActions I can specify a list of files to 
read from the table using `FILE_SCAN_TASK_SET_ID`
   ```
      val manager = FileScanTaskSetManager.get 
       manager.stageTasks(table, "fileGroupId", fileScanTasksList)
       sparkSession
         .read
         .format("iceberg")
         .option("snapshot-id", previous)
         .option(SparkReadOptions.FILE_SCAN_TASK_SET_ID, "groufileGroupIdpID")
         .load(testTable)
   ```
   What I can find is how to create `FileScanTask` from `$testTable.files` or 
from any output.
   In code, I see that `DataTableScan` use `ManifestReadtask`, but I don't see 
any code to read from one or a few predefine iceberg files.
   Do I need to write my own FileScanTask implementation, or there is a code 
that already does it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] GrigorievNick opened a new issue #2917: Can Spark iceberg connector read from specific file?

Reply via email to