yifeih commented on a change in pull request #107: Integrate encryption into 
datasource
URL: https://github.com/apache/incubator-iceberg/pull/107#discussion_r258654537
 
 

 ##########
 File path: spark/src/main/java/com/netflix/iceberg/spark/source/Reader.java
 ##########
 @@ -273,14 +282,23 @@ private Schema lazyExpectedSchema() {
     private Iterator<InternalRow> currentIterator = null;
     private Closeable currentCloseable = null;
     private InternalRow current = null;
+    private Iterator<InputFile> inputFiles = null;
 
 Review comment:
   Hmm ok I wasn't what the best way to do this was because I need to get all 
the `InputFiles` into a single iterator to pipe them through the 
`EncryptionManager`, but then I'd have to group them back with their respective 
FileScanTasks. A few options that I thought of are:
   
   1. Create an Iterator that combines the two different iterators, ensuring 
that they're always used at the same time (see latest commit)
   2. Create a map from the `FileScanTask` to the `InputFile` locations, as 
well as a map from the `InputFile` location to the decrypted `InputFile`, and 
use those maps to associate the appropriate decrypted `InputFile` with the 
`FileScanTask`... lots of bookkeeping though and a bit confusing. 
   3. Don't use the encryption batch API. This makes it pretty straightforward 
to keep everything as a single iterator. 
   
   Please let me know if I'm missing something though! 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to