GitHub user kunal0137 opened a pull request:
https://github.com/apache/drill/pull/1130
Drill 3878
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/magpierre/drill DRILL-3878
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1130.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1130
----
commit 844f34a16e75719535ff94c54d5337746ea18c20
Author: MPierre <magnus.pierre@...>
Date: 2015-11-05T14:42:06Z
Initial commit
XML support in Apache Drill
commit 592b3af06c2ff45198136577561f2ec1f7caaee0
Author: MPierre <magnus.pierre@...>
Date: 2015-11-05T21:21:42Z
Fixed some minor outstanding bugs
EasyRecordReader have a new field userName, and I forgot to change
jsonProcessor to protected from private.
commit 8fad811edab43d3499b41bb66cb419248d11208f
Author: MPierre <magnus.pierre@...>
Date: 2015-11-09T08:59:08Z
Merge remote-tracking branch 'apache/master' into DRILL-3878
commit 38f4884fe9b8456c1cde5de44c1e54177301a974
Author: MPierre <magnus.pierre@...>
Date: 2016-03-16T11:33:15Z
Syncing to latest release of drill
commit 909c5dec8bdb01bfe0ed358ebc64c959785738df
Author: MPierre <magnus.pierre@...>
Date: 2016-03-16T11:34:10Z
syncing to latest release of drill
commit 597d9657d613fa35df2c10dff23681545b13e531
Author: MPierre <magnus.pierre@...>
Date: 2016-03-18T08:55:51Z
Cleaned up deliver
Cleaned up the output generated by the SAX Parser, and removed all
unnecessary code.
commit 0cfaa31ab9af89833417288a290d21d0ce88c4ac
Author: MPierre <magnus.pierre@...>
Date: 2016-03-18T10:29:51Z
Merge remote-tracking branch 'apache/master' into DRILL-3878
commit aaaff05eb921125ad64854c89c179292c4441fb7
Author: MPierre <magnus.pierre@...>
Date: 2016-03-24T13:05:53Z
Adjusted output from Parser to fit Drill better
I have adjusted the SAX parser to produce JSON that Drill likes. Among
the things corrected is to remove empty objects from the tree built.
And to consolidate repeating values in arrays.
commit ba19a356d850224c01b9e807183377b46cf7e545
Author: MPierre <magnus.pierre@...>
Date: 2016-03-24T13:10:57Z
Fixed small typo
commit 8ba6705be42c7847d469611ab070b869e0c76d8c
Author: MPierre <magnus.pierre@...>
Date: 2016-03-24T21:17:30Z
Further enhancements of the output format to fit Drill
commit e2273f13b8e0136a33c1576c4667f16e23e1631c
Author: MPierre <magnus.pierre@...>
Date: 2016-03-24T21:22:41Z
Removed comment
commit c1b6ff8375a7e3c8161167d1a5f2b34ba165e750
Author: MPierre <magnus.pierre@...>
Date: 2016-03-29T12:48:53Z
Merge remote-tracking branch 'apache/master' into DRILL-3878
commit aacdec286781bc09dfc770044d4468ad7d83a6fc
Author: MPierre <magnus.pierre@...>
Date: 2016-03-29T18:24:04Z
Corrected if style violations
commit 980be53b7192a8b09f5932eb31b3a70a17873300
Author: MPierre <magnus.pierre@...>
Date: 2016-04-22T11:53:49Z
Addressing data volume to JSON Parser
The sax parser is streaming through the files read using an events push
model. This filter is run as part of the SAX events handler and
pre-qualifies events i.e. if the data is relevant for the query or not
and drops events not related to the query result needed. This leads to
less volume to convert from XML to JSON and less volume to send to the
JSON Record Reader and the ability to get specific information from
large documents without exploding the memory.
commit 2fe0aaf80bd4a3e616bc3c9c8d46c26472232bb0
Author: MPierre <magnus.pierre@...>
Date: 2016-04-22T11:59:26Z
Embedding JSONRecordReader instead of extending
Based on feedback, I have embedded the JSON Record Reader instead of
extending it. SInce I need to call a constructor previously in private
mode, as well as calling methods previously private I had to change
them to public. I have restored the internal variables previously
turned to protected mode, to private. The XML Record Reader is now
calling the pre-filtering class in order to lessen the total volume to
be handled in both the XML To JSON parser and the JSON Record Reader
commit 8f4ca71183ff3f688fb0b2460064227ac8ebeb7e
Author: MPierre <magnus.pierre@...>
Date: 2016-04-22T12:03:01Z
Improve output format and added EndDocument method
Adding EndDocument to support the XML Filter, otherwise the stream is
not closed and the filtering appears as hanging, and renamed generated
arrays from _array to :drill_array to differentiate vs existing tag
names that might exist in the document already.
----
---