[ 
https://issues.apache.org/jira/browse/HAWQ-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126703#comment-15126703
 ] 

ASF GitHub Bot commented on HAWQ-178:
-------------------------------------

Github user hornn commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/302#issuecomment-178107477
  
    It sounds very similar to CSV with quoted data, which is not splittable. 
The way we do it today is by ensuring we process each file by a single 
accessor, even if it actually consists of multiple splits. (see HdfsTextMulti 
profile and 
[QuotedLineBreakAccessor](https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/QuotedLineBreakAccessor.java)).
 The problem, of course, is that we lose parallelism and performance.
    @tzolov, if it requires too much re-writing I agree that we can make it in 
two stages - first the splittable case (one record per line), and then the more 
complex cases.
    @adamjshook, good to hear from you :


> Add JSON plugin support in code base
> ------------------------------------
>
>                 Key: HAWQ-178
>                 URL: https://issues.apache.org/jira/browse/HAWQ-178
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Goden Yao
>            Assignee: Goden Yao
>             Fix For: backlog
>
>         Attachments: PXFJSONPluginforHAWQ2.0andPXF3.0.0.pdf
>
>
> JSON has been a popular format used in HDFS as well as in the community, 
> there has been a few JSON PXF plugins developed by the community and we'd 
> like to see it being incorporated into the code base as an optional package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to