[ 
https://issues.apache.org/jira/browse/HAWQ-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126075#comment-15126075
 ] 

ASF GitHub Bot commented on HAWQ-178:
-------------------------------------

Github user tzolov commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/302#issuecomment-177895864
  
    @hornn , @GodenYao 
    The `pxf-json` code was implemented by @adamjshook. In this PR i've barely 
ported it to the HAWQ pxf project structure. My idea was to port the existing 
code and then improve it if needed.
    But the excellent comments above made me review the code and find a 
significant issue with the JsonRecordReader  - e.g. the multiline JSON objects 
support (also called Pretty Print - PP). Current implementation will not work 
when the JSON documents spans multiple HDFS Splits! 
    So i will remove the multiline-JSON code from the PR, leaving in only the 
LineRecordReader version (e.g assuming that json object per line). 
    Also i will open a discussion in the dev mailing list about how to handle 
in PXF documents that spans across Splits. 



> Add JSON plugin support in code base
> ------------------------------------
>
>                 Key: HAWQ-178
>                 URL: https://issues.apache.org/jira/browse/HAWQ-178
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Goden Yao
>            Assignee: Goden Yao
>             Fix For: backlog
>
>         Attachments: PXFJSONPluginforHAWQ2.0andPXF3.0.0.pdf
>
>
> JSON has been a popular format used in HDFS as well as in the community, 
> there has been a few JSON PXF plugins developed by the community and we'd 
> like to see it being incorporated into the code base as an optional package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to