[ 
https://issues.apache.org/jira/browse/JENA-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967228#comment-14967228
 ] 

ASF GitHub Bot commented on JENA-624:
-------------------------------------

Github user ajs6f commented on the pull request:

    https://github.com/apache/jena/pull/94#issuecomment-149916761
  
    Thanks for the feedback, @afs! I'll take it point-by-point:
    
    - Performance Status: The suspicion to which I was referring is that it's 
because using the Streams API generates more objects and those objects have 
more state than using iterators. It should be noted that I have not been able 
to verify this. I could, by rewriting the PR to avoid the Streams API, but it 
would require a good bit of work to do that and I think it would be _much_ 
harder to read. I think the majority of the performance decline is because of 
the changes in design, not incidental stuff like this.
    
    - DatasetFactory: I just hadn't touched that yet to avoid touching anything 
outside of my code until we are happy with the basic machinery. I will happily 
put that in as per your proposal in that message. See a forthcoming commit. 
{grin}
    
    - Assembler: I'm still reading through the assembler subsystem and making 
sure I understand it before extending it. I should be able to supply something 
soon.
    
    - Documentation: Yes, I haven't done anything here. I assume you are 
talking about documentation in the website, right? Not Javadocs?
    
    - Persistent datastructures: Cool, I'll break these out and give them some 
tests.
    
    - Dependency management: Okay, I'll square that away. I didn't realize that 
$module/DEPENDENCIES even existed. What does that support?
    
    - (Not) Mocking in tests: I think that should be doable in most cases. I 
usually prefer mocking because it makes everything very explicit, but if Jena's 
habit is otherwise, I'm happy to follow it.
    
    - Warnings: Hm, didn't see those (except for the casts). I will make sure 
everything is clean. Those casts are really weird, because sometimes I have 
seen compilation fail without them, but they do seem to be unnecessary and the 
types are infer-able. I will check into it more thoroughly.
    
    - Journaling: I think including it could be a good way also to get some 
feedback. Is there some Jena-standard way to mark something as "slightly 
experimental" a la Google's `@Beta` annotation?
    
    - LockMRPlusSW: I will slap some tests on that.


> Develop a new in-memory RDF Dataset implementation
> --------------------------------------------------
>
>                 Key: JENA-624
>                 URL: https://issues.apache.org/jira/browse/JENA-624
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>            Assignee: A. Soroka
>              Labels: gsoc, gsoc2015, java, linked_data, rdf
>
> The current (Jan 2014) Jena in-memory dataset uses a general purpose 
> container that works for any storage technology for graphs together with 
> in-memory graphs.  
> This project would develop a new implementation design specifically for RDF 
> datasets (triples and quads) and efficient SPARQL execution, for example, 
> using multi-core parallel operations and/or multi-version concurrent 
> datastructures to maximise true parallel operation.
> This is a system project suitable for someone interested in datatbase 
> implementation, datastructure design and implementation, operating systems or 
> distributed systems.
> Note that TDB can operate in-memory using a simulated disk with 
> copy-in/copy-out semantics for disk-level operations.  It is for faithful 
> testing TDB infrastructure and is not designed performance, general in-memory 
> use or use at scale.  While lesson may be learnt from that system, TDB 
> in-memory is not the answer here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to