[ 
https://issues.apache.org/jira/browse/DRILL-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991442#comment-14991442
 ] 

Magnus Pierre commented on DRILL-3878:
--------------------------------------

Hello,
I have a simple implementation of a format converter that converts XML to JSON 
and run it through Drill JSONRecordReader which works fine for the test data I 
have available. The concept works well and the performance is decent, but it 
will build the complete JSON document in memory before handing it over to the 
JSONRecordReader and that is an issue for larger documents. Currently I am 
using a home-grown sax parser that builds the JSON document using org.JSON 
classes. However, there are dom variants that also can do XSD validations and 
so on. in order to be able to plug directly into JSONRecordReader without 
having to duplicate the code, embeddedInfo, hadoopPath, and stream need either 
to be changed from private to protected, or getters and setters need to be 
provided. 

Regarding XSD's I am considering if in dfs configuration if an additional 
option per workspace referring to the file type XML, can have a XSD list/array 
so any document in that workspace should adhere to the XSD's referred to 
otherwise they will not be considered by Drill.

I will fill in the document, but I believe adding information in the jira 
itself makes it more visible to other people in the community.

Best regards,
Magnus

> Support XML Querying (selects/projections, no writing)
> ------------------------------------------------------
>
>                 Key: DRILL-3878
>                 URL: https://issues.apache.org/jira/browse/DRILL-3878
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: Future
>            Reporter: Edmon Begoli
>              Labels: features
>             Fix For: Future
>
>   Original Estimate: 3,360h
>  Remaining Estimate: 3,360h
>
> Support querying of the XML documents (as read-only selects, 
> Writing should be implemented as a different feature that brings its own set 
> of challenges.)
> To consider is reading of the trivial, schema-less, XML documents, 
> DTD-oriented ones and also of schema-defined ones.
> Also, we should consider direct querying vs. using converter tools to change 
> the representation from XML to JSON, CSV, etc.
> Design and Implementation discussion, notes, ideas and implementation 
> suggestions should be captured here:
> https://docs.google.com/document/d/1oS-cObSaTlAmuW_XghDLmHbBEorLl0z-axaHnjy7vg0/edit?usp=sharing
>  
> (no vandalism, please)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to