[jira] [Comment Edited] (TAJO-1388) [Umbrella] Kafka Storage Integration.

YeonSu Han (JIRA) Thu, 12 Mar 2015 14:31:23 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359453#comment-14359453
 ]


YeonSu Han edited comment on TAJO-1388 at 3/12/15 9:30 PM:
-----------------------------------------------------------

Hi Hyunsik,
 I'm glad for your question.
 Strictly speaking, what I am proposing is tajo on kafka. This doesn't mean 
that it support the continuous streaming processing. Just snapshot processing.  
But this enables near-real-time analysis by short-term batch processing because 
kafka allows for lower-latency processing.
 So, need not manage the offset of partition of topic. just, in every query, 
scan all data from beginning offset to last offset.
 This goal is not time-window-based like CQL. It's next goal. Maybe It's tajo 
ecosystem project, not a tajo.

I attach design draft. Please review my design and give your opinion.

Thanks~


was (Author: hys9958):
Hi Hyunsik,
 I'm glad for your question.
 Strictly speaking, what I am proposing is tajo on kafka. This doesn't mean 
that it support the continuous streaming processing. Just snapshot processing.  
But this enables near-real-time analysis by short-term batch processing because 
kafka allows for lower-latency processing.
 So, need not manage the offset of partition of topic. just, in every query, 
scan all data from beginning offset to last offset.
 This goal is not time-window-based like CQL. It's next goal. Maybe It's tajo 
echo project, not a tajo.

I attach design draft. Please review my design and give your opinion.

Thanks~

> [Umbrella] Kafka Storage Integration.
> -------------------------------------
>
>                 Key: TAJO-1388
>                 URL: https://issues.apache.org/jira/browse/TAJO-1388
>             Project: Tajo
>          Issue Type: New Feature
>          Components: storage
>            Reporter: YeonSu Han
>            Assignee: YeonSu Han
>              Labels: kafka_storage
>         Attachments: Kafka _Storage_Ingegration_draft.pdf
>
>
> Apache Kafka is one of the widely used message queueing system. If we can use 
> the Kafka as Tajo storage, analysis area of Tajo user is be broaden. For 
> example, as realtime analysis. 
> For this, I propose 'Kafka storage'. Please review my proposal and give your 
> opinion.
> * Table Creation
> {code:sql}
> CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <table_name> [(<column_name>
> <data_type>, ... )]
> using kafka with 
> (‘kafka.topic’=’<kafka_topic_name>’,‘kafka.zk’=’<kafka_zookeeper_info>’,[other
>  options])
> {code}
> ** Use “kafka” keyword in “using” clause for creating kafka table in Tajo.
> ** kafka table name is mapped to a Tajo table name with , 'kafka.topic' 
> property.
> * Column mapping of kafka message
> ** Delimited line mapping (default)
> ** json mapping
> ** ...
> * Concept
> ** The topic of kafka correspond to table.
> ** The partition of kafka correspond to file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TAJO-1388) [Umbrella] Kafka Storage Integration.

Reply via email to