[jira] [Work logged] (HIVE-21869) Clean up the Kafka storage handler readme and examples

ASF GitHub Bot (JIRA) Wed, 19 Jun 2019 20:52:20 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21869?focusedWorklogId=263509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-263509
 ]


ASF GitHub Bot logged work on HIVE-21869:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jun/19 03:51
            Start Date: 20/Jun/19 03:51
    Worklog Time Spent: 10m 
      Work Description: b-slim commented on pull request #677: HIVE-21869 Clean 
up Kafka storage handler examples
URL: https://github.com/apache/hive/pull/677#discussion_r295620689
 
 

 ##########
 File path: kafka-handler/README.md
 ##########
 @@ -1,42 +1,71 @@
 # Kafka Storage Handler Module
 
-Storage Handler that allows user to Connect/Analyse/Transform Kafka topics.
-The workflow is as follow,  first the user will create an external table that 
is a view over one Kafka topic,
-then the user will be able to run any SQL query including write back to the 
same table or different kafka backed table.
+Storage Handler that allows users to connect/analyze/transform Kafka topics.
+The workflow is as follows:
+- First, the user will create an external table that is a view over one Kafka 
topic
+- Second, the user will be able to run any SQL query including write back to 
the same table or different Kafka backed table
+
+## Kafka Management
+
+Kafka Java client version: 2.x
+
+This handler does not commit offsets of topic partition reads either using the 
intrinsic Kafka capability or in an external
+storage.  This means a query over a Kafka topic backed table will be a full 
topic read unless partitions are filtered
+manually, via SQL, by the methods described below. In the ETL section, a 
method for storing topic offsets in Hive tables
+is provided for tracking consumer position but this is not a part of the 
handler itself.
 
 ## Usage
 
 ### Create Table
-Use following statement to create table:
+Use the following statement to create a table:
+
 ```sql
-CREATE EXTERNAL TABLE kafka_table
-(`timestamp` timestamp , `page` string,  `newPage` boolean,
- added int, deleted bigint, delta double)
-STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
-TBLPROPERTIES
-("kafka.topic" = "test-topic", "kafka.bootstrap.servers"="localhost:9092");
+CREATE EXTERNAL TABLE 
+  kafka_table (
+    `timestamp` TIMESTAMP,
+    `page` STRING,
+    `newPage` BOOLEAN,
+    `added` INT, 
+    `deleted` BIGINT,
+    `delta` DOUBLE)
+STORED BY 
+  'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
+TBLPROPERTIES ( 
+  "kafka.topic" = "test-topic",
+  "kafka.bootstrap.servers" = "localhost:9092");
 ```
-Table property `kafka.topic` is the Kafka Topic to connect to and 
`kafka.bootstrap.servers` is the Broker connection string.
+
+The table property `kafka.topic` is the Kafka topic to connect to and 
`kafka.bootstrap.servers` is the Kafka broker connection string.
 Both properties are mandatory.
-On the write path if such a topic does not exists the topic will be created if 
Kafka broker admin policy allow such operation.
+On the write path if such a topic does not exist the topic will be created if 
Kafka broker admin policy allows for 
+auto topic creation.
+
+By default the serializer and deserializer is JSON, specifically 
`org.apache.hadoop.hive.serde2.JsonSerDe`.
+
+If you want to change the serializer/deserializer classes you can update the 
TBLPROPERTIES with SQL syntax `ALTER TABLE`.
 
-By default the serializer and deserializer is Json 
`org.apache.hadoop.hive.serde2.JsonSerDe`.
-If you want to switch serializer/deserializer classes you can use alter table.
 ```sql
-ALTER TABLE kafka_table SET TBLPROPERTIES 
("kafka.serde.class"="org.apache.hadoop.hive.serde2.avro.AvroSerDe");
-``` 
-List of supported Serializer Deserializer:
+ALTER TABLE 
+  kafka_table 
+SET TBLPROPERTIES (
+  "kafka.serde.class" = "org.apache.hadoop.hive.serde2.avro.AvroSerDe");
+```
+ 
+List of supported serializers and deserializers:
 
-|Supported Serializer Deserializer|
+|Supported Serializers and Deserializers|
 |-----|
 |org.apache.hadoop.hive.serde2.JsonSerDe|
 |org.apache.hadoop.hive.serde2.OpenCSVSerde|
-|org.apache.hadoop.hive.serde2.avro.AvroSerDe|
+|org.apache.hadoop.hive.serde2.avro.AvroSerDe*|
 
 Review comment:
   not sure what is the `*` at the end
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 263509)
    Time Spent: 20m  (was: 10m)

> Clean up the Kafka storage handler readme and examples
> ------------------------------------------------------
>
>                 Key: HIVE-21869
>                 URL: https://issues.apache.org/jira/browse/HIVE-21869
>             Project: Hive
>          Issue Type: Improvement
>          Components: kafka integration
>    Affects Versions: 4.0.0
>            Reporter: Kristopher Kane
>            Assignee: Kristopher Kane
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21869.1.patch, HIVE-21869.2.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21869) Clean up the Kafka storage handler readme and examples

Reply via email to