[ 
https://issues.apache.org/jira/browse/FLUME-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491442#comment-13491442
 ] 

alex gemini commented on FLUME-1669:
------------------------------------

usually the sequence file or avro file format block size is quite small,the 
columnar format will only get benefit when block size is quite large usually a 
few GB is minimum ,see the trenvi spec "Desing" Section line 2. It's not 
practical to hold that too much data in memory considering service crash or 
reload configuration .It's better write sequence or avro file format to a 
directory then after some point merge this directory to columnar format when 
flume rolling to the next directory .another thing should be noticed is 
currently the query engine (hive,pig and others) didn't support one directory 
contains two different file format, but hive support one table contain two 
partition with different file format .So I think maybe flume should monitor two 
dictionary,one for currently writing dictionary,it will write small avro or 
sequence format with multiple writer, when data stream rolling to next,flume 
will merge this avro or sequence file format to trenvi columnar format maybe 
using mr.
                
> Add support for columnar event serializer in HDFS
> -------------------------------------------------
>
>                 Key: FLUME-1669
>                 URL: https://issues.apache.org/jira/browse/FLUME-1669
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>            Reporter: Mubarak Seyed
>            Assignee: Mubarak Seyed
>              Labels: noob
>             Fix For: v1.4.0
>
>
> Motivation:
> Columnar storage is preferred for better performance and compression for 
> low-latency analytical workloads. Avro 1.7.2 supports column-major file 
> format [1]
> and we can implement {{AbstractTrevniAvroEventSerializer}} (as like 
> {{AbstractAvroEventSerializer}}). {{HDFSSink}} can have serializer type to 
> store events in Trevni column-major file format.
> [1]    http://avro.apache.org/docs/current/trevni/spec.html
>        https://github.com/cutting/trevni

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to