[ 
https://issues.apache.org/jira/browse/DRILL-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230167#comment-14230167
 ] 

Jacques Nadeau commented on DRILL-1774:
---------------------------------------

Thanks [~paulrbrown].  Actually, the issue is that we wanted to rewind the 
stream without generating a lot of extra gc overhead and complexity (since we 
only rewind once every 5000 records or so).  Previously, that had us doing a 
two pass parsing.  With the latest Jackson, we were able to create a rewindable 
Parser by subclassing the UTF8 parser, thus allowing creation of a single 
parser and shared canonicalization.  We're still doing a double conversion with 
strings utf8 -> utf16 -> utf8 but updating the parser would have been 
substantially more complicated to avoid that double conversion (Drill uses utf8 
internally).  I'll be sure to drop by the Jackson lists to discuss.

> Improve JSON Performance
> ------------------------
>
>                 Key: DRILL-1774
>                 URL: https://issues.apache.org/jira/browse/DRILL-1774
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>             Fix For: 0.7.0
>
>         Attachments: DRILL-1774.patch
>
>
> There are multiple reasons that JSON performance is subpar.  We need to 
> update the reader to do the following:
> - Avoid double reading (for rewinding)
> - Take advantage of Jackson interning/hashing of field names.
> - Improve field selection behavior. (avoid object creation and multiple 
> traversals)
> - Better delineation of allTextMode behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to