[ 
https://issues.apache.org/jira/browse/DRILL-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248355#comment-16248355
 ] 

ASF GitHub Bot commented on DRILL-4779:
---------------------------------------

Github user akumarb2010 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1027#discussion_r150376174
  
    --- Diff: contrib/storage-kafka/README.md ---
    @@ -0,0 +1,230 @@
    +# Drill Kafka Plugin
    +
    +Drill kafka storage plugin allows you to perform interactive analysis 
using SQL against Apache Kafka.
    +
    +<h4 id="Supported kafka versions">Supported Kafka Version</h4>
    +Kafka-0.10 and above </p>
    +
    +<h4 id="Supported Message Formats">Message Formats</h4>
    +Currently this plugin supports reading only Kafka messages of type 
<strong>JSON</strong>.
    +
    +
    +<h4>Message Readers</h4>
    +<p>Message Readers are used for reading messages from Kafka. Type of the 
MessageReaders supported as of now are</p>
    +
    +<table style="width:100%">
    +  <tr>
    +    <th>MessageReader</th>
    +    <th>Description</th>
    +    <th>Key DeSerializer</th> 
    +    <th>Value DeSerializer</th>
    +  </tr>
    +  <tr>
    +    <td>JsonMessageReader</td>
    +    <td>To read Json messages</td>
    +    <td>org.apache.kafka.common.serialization.ByteArrayDeserializer</td> 
    +    <td>org.apache.kafka.common.serialization.ByteArrayDeserializer</td>
    +  </tr>
    +</table>
    +
    +
    +<h4 id="Plugin Configurations">Plugin Configurations</h4>
    +Drill Kafka plugin supports following properties
    +<ul>
    +   <li><strong>kafkaConsumerProps</strong>: These are typical <a 
href="https://kafka.apache.org/documentation/#consumerconfigs";>Kafka consumer 
properties</a>.</li>
    +<li><strong>drillKafkaProps</strong>: These are Drill Kafka plugin 
properties. As of now, it supports the following properties
    +   <ul>
    +<li><strong>drill.kafka.message.reader</strong>: Message Reader 
implementation to use while reading messages from Kafka. Message reader 
implementaion should be configured based on message format. Type of message 
readers
    + <ul>
    + <li>org.apache.drill.exec.store.kafka.decoders.JsonMessageReader</li>
    + </ul>
    +</li>
    +<li><strong>drill.kafka.poll.timeout</strong>: Polling timeout used by 
Kafka client while fetching messages from Kafka cluster.</li>
    +</ul>
    +</li>
    +</ul>
    +
    +<h4 id="Plugin Registration">Plugin Registration</h4>
    +To register the kafka plugin, open the drill web interface. To open the 
drill web interface, enter <strong>http://drillbit:8047/storage</strong> in 
your browser.
    +
    +<p>The following is an example plugin registration configuration</p>
    +<pre>
    +{
    +  "type": "kafka",
    +  "kafkaConsumerProps": {
    +    "key.deserializer": 
"org.apache.kafka.common.serialization.ByteArrayDeserializer",
    +    "auto.offset.reset": "earliest",
    +    "bootstrap.servers": "localhost:9092",
    +    "enable.auto.commit": "true",
    +    "group.id": "drill-query-consumer-1",
    +    "value.deserializer": 
"org.apache.kafka.common.serialization.ByteArrayDeserializer",
    +    "session.timeout.ms": "30000"
    +  },
    +  "drillKafkaProps": {
    +    "drill.kafka.message.reader": 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
    +    "drill.kafka.poll.timeout": "2000"
    +  },
    +  "enabled": true
    +}
    +</pre>
    +
    +<h4 id="Abstraction"> Abstraction </h4>
    +<p>In Drill, each Kafka topic is mapped to a SQL table and when a query is 
issued on a table, it scans all the messages from the earliest offset to the 
latest offset of that topic at that point of time. This plugin automatically 
discovers all the topics (tables), to allow you perform analysis without 
executing DDL statements.
    --- End diff --
    
    This is very valid point Paul. Only issue is, in other storage plugins like 
Mongo, we are able to push down these predicates as filters to storage. Since 
they support predicate push down.
    
    But in case of Kafka, we cannot push down these filters. So to achieve this 
we can create specific  KafkaSubScanSpec for the query range by parsing 
predicates on kafkaMsgOffset.  But this needs some time for developing and 
testing. Is that OK, If we create separate JIRA for this and release in next 
version?


> Kafka storage plugin support
> ----------------------------
>
>                 Key: DRILL-4779
>                 URL: https://issues.apache.org/jira/browse/DRILL-4779
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>    Affects Versions: 1.11.0
>            Reporter: B Anil Kumar
>            Assignee: B Anil Kumar
>              Labels: doc-impacting
>             Fix For: 1.12.0
>
>
> Implement Kafka storage plugin will enable the strong SQL support for Kafka.
> Initially implementation can target for supporting json and avro message types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to