[ https://issues.apache.org/jira/browse/DAFFODIL-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023227#comment-18023227 ]
Mike Beckerle commented on DAFFODIL-2935: ----------------------------------------- Study of the code of Exificient shows that it is *not* taking advantage of possibly being presented with interned strings repeatedly. So for every string it is presented as an element name to the SAX API, it is assuming it is the same as other strings only by character comparison. It seems like the algorithm could be improved to mostly do constant time retrieval of the necessary element ID number given the element name, just by adding a hash table caching strategy which uses the exact same algorithm if the element name is not found, as in the code today, but just caching results into a hash table. Likely the hash tables have to be nested, as the integers assigned to represent the strings are element-nesting (stack disci;line) relative. > EXI output from parse should be faster than textual XML > ------------------------------------------------------- > > Key: DAFFODIL-2935 > URL: https://issues.apache.org/jira/browse/DAFFODIL-2935 > Project: Daffodil > Issue Type: Improvement > Components: Back End > Affects Versions: 3.9.0 > Reporter: Mike Beckerle > Priority: Major > > Recent performance work showed that EXI, while creating smaller output sizes, > is not faster. > This seems to have something to do with using the SAX API. > Daffodil *should* be able to pass the same element name objects (exact same > object reference) repeatedly to the start-element and end-element SAX events. > These strings are stored in the element ERD data structure and should not > need to be allocated/constructed or even traversed for each element. (If > we're not storing the exact right string on the ERD, it should be added so > it's sitting there ready to use.) > We also don't know if Exificient library is able to take advantage of the > fact that every start-element and end-element event for the same element > should be passed the exact same object reference for the string identifier(s) > of the element. > -- This message was sent by Atlassian Jira (v8.20.10#820010)