[jira] [Commented] (JENA-178) SPARQL Results serialization is slow for some formats with large result sets

Damian Steer (Commented) (JIRA) Sat, 17 Dec 2011 02:59:04 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171519#comment-13171519
 ]


Damian Steer commented on JENA-178:
-----------------------------------

Exploring a little more I noticed that there were an awful lot of flushes going 
on. org.openjena.atlas.io.IndentedWriter was responsible:

public void newline()
{
    ....
    // Note that PrintWriters do not autoflush by default
    // so if layered over a PrintWirter, need to flush that as well.  
    flush() ;
}

This in practice negates the BufferingWriter buffering, and removing it halves 
the XML output time.

Next we have the RIOT BufferingWriter. Comparing:

writer = BufferingWriter.create(out); // RIOT

and

writer = new BufferedWriter(new OutputStreamWriter(out, "UTF-8")); // JDK

I found the former took 3 times longer to write random strings to a file.
                
> SPARQL Results serialization is slow for some formats with large result sets
> ----------------------------------------------------------------------------
>
>                 Key: JENA-178
>                 URL: https://issues.apache.org/jira/browse/JENA-178
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.8.9
>         Environment: Windows 7 Enterprise 64 bit
>            Reporter: Rob Vesse
>         Attachments: Jena178.java, TestArqSerializerPerformance.java, 
> XMLOutputSAX.java, XMLOutputStAX.java
>
>
> The SPARQL XML and JSON Result formats are very slow when the result set is 
> large.  This is surprising to me since both formats are relatively simple and 
> should lend themselves to fairly fast streaming serialization and parsing.
> The following are observed performance figures comparing SPARQL XML, SPARQL 
> JSON and SPARQL TSV results format.  This is the averaged time over 5 runs to 
> retrieve the first 50,000 triples from the dataset with a simple SELECT * 
> WHERE { ?s ?p ?o } LIMIT 50000 via a HTTP request to Fuseki and iterate over 
> the results on the client.
> SPARQL XML = 15.25 seconds
> SPARQL JSON = 10.9 seconds
> SPARQL TSV = 0.54 seconds
> Now obviously TSV is way simpler to serialize and parse than XML/JSON but 
> these serializers and parsers should not be 20-30 times slower IMO
> Also for comparison note that doing an equivalent CONSTRUCT { ?s ?p ?p } 
> WHERE { ?s ?p ?o } LIMIT 50000 takes only about 2s and that is using RDF/XML 
> serialization which I would have expected to be slower because RDF/XML is 
> more complex to generate than either SPARQL XML/JSON results.  I haven't 
> dived into the code in detail to investigate why this is slow yet but do the 
> Jena team have any thoughts on this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-178) SPARQL Results serialization is slow for some formats with large result sets

Reply via email to