[ https://issues.apache.org/jira/browse/JENA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Damian Steer updated JENA-178: ------------------------------ Attachment: Jena178.patch This patch: * Stops the indenter flushing on every new line. * Replaces the XML output char escape function. * Uses the same CharsetEncoder for all BufferingWriter conversions. Before: XML: 2651.63ms +- 77.02ms TSV: 52.13ms +- 1.90ms JSON: 2346.25ms +- 306.44ms After: XML: 541.63ms +- 11.52ms TSV: 51.75ms +- 1.92ms JSON: 600.13ms +- 6.62ms So much better, but still needs some work. I should look at the SAX and StAX performance. > SPARQL Results serialization is slow for some formats with large result sets > ---------------------------------------------------------------------------- > > Key: JENA-178 > URL: https://issues.apache.org/jira/browse/JENA-178 > Project: Jena > Issue Type: Bug > Components: ARQ > Affects Versions: ARQ 2.8.9 > Environment: Windows 7 Enterprise 64 bit > Reporter: Rob Vesse > Attachments: Jena178.java, Jena178.patch, > TestArqSerializerPerformance.java, XMLOutputSAX.java, XMLOutputStAX.java > > > The SPARQL XML and JSON Result formats are very slow when the result set is > large. This is surprising to me since both formats are relatively simple and > should lend themselves to fairly fast streaming serialization and parsing. > The following are observed performance figures comparing SPARQL XML, SPARQL > JSON and SPARQL TSV results format. This is the averaged time over 5 runs to > retrieve the first 50,000 triples from the dataset with a simple SELECT * > WHERE { ?s ?p ?o } LIMIT 50000 via a HTTP request to Fuseki and iterate over > the results on the client. > SPARQL XML = 15.25 seconds > SPARQL JSON = 10.9 seconds > SPARQL TSV = 0.54 seconds > Now obviously TSV is way simpler to serialize and parse than XML/JSON but > these serializers and parsers should not be 20-30 times slower IMO > Also for comparison note that doing an equivalent CONSTRUCT { ?s ?p ?p } > WHERE { ?s ?p ?o } LIMIT 50000 takes only about 2s and that is using RDF/XML > serialization which I would have expected to be slower because RDF/XML is > more complex to generate than either SPARQL XML/JSON results. I haven't > dived into the code in detail to investigate why this is slow yet but do the > Jena team have any thoughts on this? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira