Parser using Platform encoding instead of UTF-8

Rupert Westenthaler (Created) (JIRA) Thu, 13 Oct 2011 14:27:37 -0700

Weak Performance of "application/json+rdf" serializer on big TripleCollections 
and Serialzer/Parser using Platform encoding instead of UTF-8
--------------------------------------------------------------------------------------------------------------------------------------------


                 Key: CLEREZZA-643
                 URL: https://issues.apache.org/jira/browse/CLEREZZA-643
             Project: Clerezza
          Issue Type: Improvement
            Reporter: Rupert Westenthaler


Both the "application/json+rdf" serializer and parser use platform specific 
encodings instead of UTF-8.

In addition the serializer suffers from very poor performance on big graphs (at 
least when using SimpleMGrpah)

After some digging in the Code I came to the conclusion that this is because of 
the use of multiple TripleCollection.filter(..) calls fist to filter all 
predicates for an subject and than all objects for each subject/predicate 
combination. A trying to serialize a graph with 50k triples ended in several 
minutes 100% CPU.

With the next comment I will provide a patch with an implementation based on a 
sorted array of the triples. With this method one can serialize graphs with 
100k in about 1sec. This patch also changes encoding to UTF-8.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CLEREZZA-643) Weak Performance of "application/json+rdf" serializer on big TripleCollections and Serialzer/Parser using Platform encoding instead of UTF-8

Reply via email to