justinleet opened a new pull request #839: HIVE-21894: Hadoop credential 
password storage for the Kafka Storage handler when security is SSL
URL: https://github.com/apache/hive/pull/839
 
 
   [HIVE-21894](https://issues.apache.org/jira/browse/HIVE-21894)
   
   Allows for the KafkaStorageHandler to be configured with SSL properties, 
where the passwords aren't in plaintext in the table configs.  
   
   This has been tested on an actual Hadoop cluster against an actual Kafka 
cluster, but in a pretty limited manner and primarily for the consumer side of 
things (full disclosure, my use case is pretty exclusively read from).  I've 
done some basic testing to make sure both queries that aren't spinning up jobs 
(e.g. simple `SELECT *` type queries) and queries that do spin up jobs (e.g. 
some basic `GROUP BY`) all runs to success.
   
   There's a couple things that probably need some feedback and possibly 
iteration.
   
   - Distribution of the key/trust stores. Kafka can only work with these 
stores locally, but they need to be distributed for jobs, so HDFS seems like 
the right place to keep them. Right now, it's an HDFS file that is being pulled 
via the standard HDFS APIs into `DOWNLOADED_RESOURCES_DIR`.  There are other 
StorageHandlers (see: 
[HIVE-21894](https://issues.apache.org/jira/browse/HIVE-21894?focusedCommentId=16869476&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16869476))
 do some dealing with files, but they seem to do with jars and go through the 
`tmpjars` config (which I believe is just essentially `-libjars`).
     - Is this the right place to put the files?
     - Is this a more reasonable way to get them?
   - Right now, producer / consumer SSL configs are assumed to be the same 
(i.e. `hive.kafka.ssl.keystore.password` instead of `hive.kafka.consumer.ssl 
...` and `hive.kafka.producer.ssl ...`
     - This could fairly easily be split out if there's a need. I'm not 
honestly sure how much configuring a producer and consumer separately would be 
used in practice.
   - Naming of the configs. If there are any particular conventions I should 
follow, let me know and I'll test and update.
   - Automated testing. Given the need for HDFS and Kafka, I've just added some 
tests that the configs end up reasonable, but we may want more and I'm not 
familiar enough with Hive's testing utilities to know if there are better 
options.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to