justinleet opened a new pull request #839: HIVE-21894: Hadoop credential password storage for the Kafka Storage handler when security is SSL URL: https://github.com/apache/hive/pull/839 [HIVE-21894](https://issues.apache.org/jira/browse/HIVE-21894) Allows for the KafkaStorageHandler to be configured with SSL properties, where the passwords aren't in plaintext in the table configs. This has been tested on an actual Hadoop cluster against an actual Kafka cluster, but in a pretty limited manner and primarily for the consumer side of things (full disclosure, my use case is pretty exclusively read from). I've done some basic testing to make sure both queries that aren't spinning up jobs (e.g. simple `SELECT *` type queries) and queries that do spin up jobs (e.g. some basic `GROUP BY`) all runs to success. There's a couple things that probably need some feedback and possibly iteration. - Distribution of the key/trust stores. Kafka can only work with these stores locally, but they need to be distributed for jobs, so HDFS seems like the right place to keep them. Right now, it's an HDFS file that is being pulled via the standard HDFS APIs into `DOWNLOADED_RESOURCES_DIR`. There are other StorageHandlers (see: [HIVE-21894](https://issues.apache.org/jira/browse/HIVE-21894?focusedCommentId=16869476&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16869476)) do some dealing with files, but they seem to do with jars and go through the `tmpjars` config (which I believe is just essentially `-libjars`). - Is this the right place to put the files? - Is this a more reasonable way to get them? - Right now, producer / consumer SSL configs are assumed to be the same (i.e. `hive.kafka.ssl.keystore.password` instead of `hive.kafka.consumer.ssl ...` and `hive.kafka.producer.ssl ...` - This could fairly easily be split out if there's a need. I'm not honestly sure how much configuring a producer and consumer separately would be used in practice. - Naming of the configs. If there are any particular conventions I should follow, let me know and I'll test and update. - Automated testing. Given the need for HDFS and Kafka, I've just added some tests that the configs end up reasonable, but we may want more and I'm not familiar enough with Hive's testing utilities to know if there are better options.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org