Re: SPARQL 1.1 Update in ARQ 2.8.7

Andy Seaborne Thu, 03 Feb 2011 06:37:50 -0800


On 01/02/11 16:36, Stephen Allen wrote:

Andy,

I have started implementing the serializer (SinkBindingOutput) by using
org.openjena.riot.SinkQuadOutput as a guide and using OutputLangUtils to
print out the variable/values.  I created the deserializer (LangBindings) by
extending org.openjena.riot.lang.LangNTuple.  I'm using the paired var/value
format you described below.  For now I'll start with a straightforward
implementation with no compression, but like your ideas in this area.  I'll
try to do some measurements to see if any other compression is beneficial.


Sounds good.


I did not define an org.openjena.riot.Lang enum for the deserializer
(because it isn't an RDF language) but I was planning on putting the
LangBindings class in the org.openjena.riot.lang package.


As good a place as any at the moment.

I've just digging out some code that does tuple I/O from anexperiemental system a while ago (a clustered query engine ..).


For determining when to spill bindings to disk, there are a few options (in
order of least difficulty):
1) Store binding objects in an list, and then spill them to disk once the
list size passes a threshold
2) Start serializing bindings immediately into something like
DeferredFileOutputStream [1] that will retain the data in memory until it
passes a memory threshold
3) Do 1), but try to calculate the size of the bindings in memory and use a
memory threshold instead of a number of bindings threshold

I think 1) should be sufficient if we come up with a reasonable guess for
the threshold.  Option 2) lets you get much better control over the memory
management, but I think the cost of unnecessarily serializing/deserializing
small queries may be too high.

Persoanlly, I'd encapsulate this in a policy object and have differentimplementations. Well, may just one implementation - case 1 with asettable threshold for testing. (3) then becomes a smarter policyobject to be done later, if needed.


I share your concern on (2) about the serialization to memory costs.

        Andy

Re: SPARQL 1.1 Update in ARQ 2.8.7

Reply via email to