Hi, 

   [ I just opened a bug report for this, but it was suggested that a wider 
discussion on how to do it would be useful on this list. ]

  In a Linked Data environment servers have to fetch data off the web. The 
speed at which such data  is served can be very slow. So one wants to avoid 
using up one thread for each connections (1 thread = 0.5 to 1MB approximately). 
This is why Java NIO was developed and why servers such as Netty are so 
popular, why http client libraries such as 
https://github.com/sonatype/async-http-client are more  and more numerous, 
and why actor frameworks such as http://akka.io/ which support relatively 
lightweight actors (500 bytes per actor) are growing more visible. 

Unless I am mistaken the only way to parse some content is using methods that 
use an 
InputStream such as this: 

    val m = ModelFactory.createDefaultModel() 
     m.getReader(lang.jenaLang).read(m, in, base.toString) 

That read call *blocks*: i.e. the thread that calls that will then
spend all its time on the reading in the information, HOWEVER SLOWLY
it is sent. Would it be possible to have an API which allows 
one to parse a document in chunks as they arrive from the input? 

Without that each request for a remote resource ties up a minimum of 0.5-1 MB,
plus the swapping costs of threads (which is known to be very high). So if you
fetch 500 remote resources before you even get started and you use up 500MB 
whilst you slow down your machine dramatically due to swapping. Instead with
akka actors you would use 500bytes*500 = 250000bytes = 250kbytes = 1/4 MB 
plus perhaps a few threads. With simple NIO you have the same or even less.
1 NIO thread can read as much input as it can handle. And you probably just need
a few worker threads if the parsing is more work that reading. So just like that
we can save a lot of memory.

   HAVING Said that.

   
   What is the best way to do this?

   An (ugly?) solution that would work is just to have a method
    
    reader.write(byteArray)

   So instead of having the thread doing the reading, this makes it possible
for the IO layer to pass blocks of characters straight to the model as those
blocks of characters come along.

   It would be better of course if the structure passed could be one that was 
not
changeable, even better, if it could use NIO bytes buffers as that reduces 
the need even to copy data, but I guess that the Jena parsers were not written 
with that in mind.

   I did open the issue-203 so that when we agree on a solution we could send in
some patches.

 https://issues.apache.org/jira/browse/JENA-203

        Henry


Social Web Architect
http://bblfish.net/

Reply via email to