[ 
https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942066#comment-15942066
 ] 

Pengcheng Xiong commented on HIVE-13275:
----------------------------------------

Hello, I am deferring this to Hive 3.0 as we are going to cut the first RC and 
it is not marked as blocker. Please feel free to commit to the branch if this 
can be resolved before the release.

> Add a toString method to BytesRefArrayWritable
> ----------------------------------------------
>
>                 Key: HIVE-13275
>                 URL: https://issues.apache.org/jira/browse/HIVE-13275
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats, Serializers/Deserializers
>    Affects Versions: 1.1.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Trivial
>         Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
> Streaming generally relies on the K/V pairs to be able to emit text 
> representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the 
> RCFileInputFormat causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
> array), so its important to output them in a valid/parseable manner, as 
> opposed to choosing a simple joining delimiter over the string 
> representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that 
> users of Streaming can then parse the results in their own script. Since we 
> have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to