[GitHub] [iceberg] rdsr commented on pull request #1267: Single jar for input formats

GitBox Thu, 30 Jul 2020 01:37:20 -0700


rdsr commented on pull request #1267:
URL: https://github.com/apache/iceberg/pull/1267#issuecomment-666208956



   > I run MR in local mode whereas this seems to be running in distributed 
mode with YARN. I'll have to dig deeper but my guess is that `table` is null 
because in that case the calls to `getSplits` and then `getRecordReader` 
happens in two different processes.
   > @rdsr, what do you think of this approach? One downside is the increase in 
size of the serialized splits.
   
   Hi @guilload, @massdosage . I was trying out an alternative way of passing 
the required parameters. It seems instead of setting the `TABLE_PATH`, `SCHEMA` 
etc in `InputFormat#getsplits` method, which is not being propagate to record 
readers on worker nodes, I tried setting the required parameters in 
`org.apache.hadoop.hive.ql.metadata.HiveStorageHandler#configureInputJobProperties`.
  From that method's javadoc
     
   > /**
   >    * This method is called to allow the StorageHandlers the chance
   >    * to populate the JobContext.getConfiguration() with properties that
   >    * maybe be needed by the handler's bundled artifacts (ie InputFormat, 
SerDe, etc).
   
    it looks like that it maybe the right method to do what we are trying to 
achieve. Below are my modifications.
   
   
https://github.com/apache/iceberg/compare/master...rdsr:alternative_conf?expand=1
   
   I've yet to try this out on a real cluster to see if this works in a 
distributed YARN mode though. I plan to do this tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdsr commented on pull request #1267: Single jar for input formats

Reply via email to