[ https://issues.apache.org/jira/browse/HIVE-21869?focusedWorklogId=263509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-263509 ]
ASF GitHub Bot logged work on HIVE-21869: ----------------------------------------- Author: ASF GitHub Bot Created on: 20/Jun/19 03:51 Start Date: 20/Jun/19 03:51 Worklog Time Spent: 10m Work Description: b-slim commented on pull request #677: HIVE-21869 Clean up Kafka storage handler examples URL: https://github.com/apache/hive/pull/677#discussion_r295620689 ########## File path: kafka-handler/README.md ########## @@ -1,42 +1,71 @@ # Kafka Storage Handler Module -Storage Handler that allows user to Connect/Analyse/Transform Kafka topics. -The workflow is as follow, first the user will create an external table that is a view over one Kafka topic, -then the user will be able to run any SQL query including write back to the same table or different kafka backed table. +Storage Handler that allows users to connect/analyze/transform Kafka topics. +The workflow is as follows: +- First, the user will create an external table that is a view over one Kafka topic +- Second, the user will be able to run any SQL query including write back to the same table or different Kafka backed table + +## Kafka Management + +Kafka Java client version: 2.x + +This handler does not commit offsets of topic partition reads either using the intrinsic Kafka capability or in an external +storage. This means a query over a Kafka topic backed table will be a full topic read unless partitions are filtered +manually, via SQL, by the methods described below. In the ETL section, a method for storing topic offsets in Hive tables +is provided for tracking consumer position but this is not a part of the handler itself. ## Usage ### Create Table -Use following statement to create table: +Use the following statement to create a table: + ```sql -CREATE EXTERNAL TABLE kafka_table -(`timestamp` timestamp , `page` string, `newPage` boolean, - added int, deleted bigint, delta double) -STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' -TBLPROPERTIES -("kafka.topic" = "test-topic", "kafka.bootstrap.servers"="localhost:9092"); +CREATE EXTERNAL TABLE + kafka_table ( + `timestamp` TIMESTAMP, + `page` STRING, + `newPage` BOOLEAN, + `added` INT, + `deleted` BIGINT, + `delta` DOUBLE) +STORED BY + 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' +TBLPROPERTIES ( + "kafka.topic" = "test-topic", + "kafka.bootstrap.servers" = "localhost:9092"); ``` -Table property `kafka.topic` is the Kafka Topic to connect to and `kafka.bootstrap.servers` is the Broker connection string. + +The table property `kafka.topic` is the Kafka topic to connect to and `kafka.bootstrap.servers` is the Kafka broker connection string. Both properties are mandatory. -On the write path if such a topic does not exists the topic will be created if Kafka broker admin policy allow such operation. +On the write path if such a topic does not exist the topic will be created if Kafka broker admin policy allows for +auto topic creation. + +By default the serializer and deserializer is JSON, specifically `org.apache.hadoop.hive.serde2.JsonSerDe`. + +If you want to change the serializer/deserializer classes you can update the TBLPROPERTIES with SQL syntax `ALTER TABLE`. -By default the serializer and deserializer is Json `org.apache.hadoop.hive.serde2.JsonSerDe`. -If you want to switch serializer/deserializer classes you can use alter table. ```sql -ALTER TABLE kafka_table SET TBLPROPERTIES ("kafka.serde.class"="org.apache.hadoop.hive.serde2.avro.AvroSerDe"); -``` -List of supported Serializer Deserializer: +ALTER TABLE + kafka_table +SET TBLPROPERTIES ( + "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe"); +``` + +List of supported serializers and deserializers: -|Supported Serializer Deserializer| +|Supported Serializers and Deserializers| |-----| |org.apache.hadoop.hive.serde2.JsonSerDe| |org.apache.hadoop.hive.serde2.OpenCSVSerde| -|org.apache.hadoop.hive.serde2.avro.AvroSerDe| +|org.apache.hadoop.hive.serde2.avro.AvroSerDe*| Review comment: not sure what is the `*` at the end ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 263509) Time Spent: 20m (was: 10m) > Clean up the Kafka storage handler readme and examples > ------------------------------------------------------ > > Key: HIVE-21869 > URL: https://issues.apache.org/jira/browse/HIVE-21869 > Project: Hive > Issue Type: Improvement > Components: kafka integration > Affects Versions: 4.0.0 > Reporter: Kristopher Kane > Assignee: Kristopher Kane > Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21869.1.patch, HIVE-21869.2.patch > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)