[
https://issues.apache.org/jira/browse/HAWQ-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126703#comment-15126703
]
ASF GitHub Bot commented on HAWQ-178:
-------------------------------------
Github user hornn commented on the pull request:
https://github.com/apache/incubator-hawq/pull/302#issuecomment-178107477
It sounds very similar to CSV with quoted data, which is not splittable.
The way we do it today is by ensuring we process each file by a single
accessor, even if it actually consists of multiple splits. (see HdfsTextMulti
profile and
[QuotedLineBreakAccessor](https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/QuotedLineBreakAccessor.java)).
The problem, of course, is that we lose parallelism and performance.
@tzolov, if it requires too much re-writing I agree that we can make it in
two stages - first the splittable case (one record per line), and then the more
complex cases.
@adamjshook, good to hear from you :
> Add JSON plugin support in code base
> ------------------------------------
>
> Key: HAWQ-178
> URL: https://issues.apache.org/jira/browse/HAWQ-178
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: PXF
> Reporter: Goden Yao
> Assignee: Goden Yao
> Fix For: backlog
>
> Attachments: PXFJSONPluginforHAWQ2.0andPXF3.0.0.pdf
>
>
> JSON has been a popular format used in HDFS as well as in the community,
> there has been a few JSON PXF plugins developed by the community and we'd
> like to see it being incorporated into the code base as an optional package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)