About test failing strange... I don't see failures:
Tests run: 41, Failures: 0, Errors: 0, Skipped: 0
Share details on your failures, I might have a look (but not today).
If you are keen, you can look at EARQ as well which is not just about
ElasticSearch.
It was done to experiment with a refactoring which made easier to plug-in
different
indexes... and indeed EARQ has Lucene, Solr and ElasticSearch in it):
https://github.com/castagna/EARQ
Paolo
Anuj Kumar wrote:
Sure, I will let you know in case I have any queries. The tests were
failing when I built SARQ on my machine but I will look into it later.
As you mentioned, it is really good to understand the integration with
LARQ as a reference. So, I am doing that.
Thanks for the info.
- Anuj
On Thu, Mar 17, 2011 at 1:14 PM, Paolo Castagna
<[email protected] <mailto:[email protected]>>
wrote:
Anuj Kumar wrote:
Thanks Paolo. I am looking into LARQ and also SARQ.
Be warned: SARQ is just an experiment (and currently unsupported).
However, if you prefer to use Solr, share with us you use case and
your reasons
and let me know if you have problems with it.
SARQ might be a little bit behind in relation to the removals from
the index,
but you can look at what LARQ does and port the same approach into SARQ.
Paolo
On Thu, Mar 17, 2011 at 12:18 AM, Paolo Castagna <
[email protected]
<mailto:[email protected]>> wrote:
Anuj Kumar wrote:
Hi Andy,
I have loaded few n-triples into TDB in the offline mode
using tdbloader.
Loading as well as query is fast but if I try to use a
regex, it becomes
very slow. It is taking few minutes. On my 32bit machine
it takes more
than
10 mins (expected due to limited memory ~ 1.5GB) and on
my 64bit machine
(8GB) it takes around 5 mins.
The query is pretty exhaustive, correct me if it is
happening due to the
filter-
SELECT ?abstract
WHERE {
?resource <http://www.w3.org/2000/01/rdf-schema#label> ?l .
FILTER regex(?l, "Futurama", "i") .
?resource <http://dbpedia.org/ontology/abstract> ?abstract
}
I have loaded few abstracts from dbpedia dump and I am
trying to get the
abstracts from the label. This is very slow. If I remove
the FILTER and
give
the exact label, it is fast (should be because of TDB
indexing).
What is the right way to do such regex search or text
search over the
graph?
I have seen suggestions to use Lucene and I also saw the
LARQ initiative.
Is
that the right way to go?
Yes, using LARQ (which is included in ARQ) will greatly
speed up your
query.
LARQ documentation is here:
http://jena.sourceforge.net/ARQ/lucene-arq.html
You will need to build the Lucene index first, though.
Paolo
Thanks,
Anuj
On Tue, Mar 15, 2011 at 5:09 PM, Andy Seaborne <
[email protected]
<mailto:[email protected]>> wrote:
Just so you know: The TDB bulkloader can load all the
data offline - it's
faster than using Fuseki for data loading online.
Andy
On 15/03/11 11:22, Anuj Kumar wrote:
Hi Andy,
Thanks for the info. I have loaded few GBs using
Fuseki Server but I
didn't
try RiotReader or Java APIs for TDB. Will try that.
Thanks for the response.
Regards,
Anuj
On Tue, Mar 15, 2011 at 4:12 PM, Andy Seaborne<
[email protected]
<mailto:[email protected]>> wrote:
1/ Have you considered reading the DBpedia data
into TDB? This would
keep
the triples on-disk (and have cached
in-memory versions of a subset).
2/ A file can be read sequentially by using
the parser directly (See
RiotReader and pass in a Sink<Triple> that
processes the stream of
triples).
Andy
On 14/03/11 18:42, Anuj Kumar wrote:
Hi All,
I am new to Jena and trying to explore
it to work with large number of
N-Triples. The requirement is to read
large number of N-Triples. For
example, a nt file from DBpedia dump
that may run into GBs. I have to
read
these triples, pick specific ones and
further link it to the resource
of
another set of triples. The goal is to
link some of the entities based
on
Linked Data concept. Once the mapping is
done, I have to query the
model
from that point onwards. I don't want to
work by loading both the
source
and
target dataset in-memory.
To achieve this, I have first created a
file model maker and then a
named
model for the specific dataset being
mapped. Now, I need to read the
Triples
and add the mapping to this new model.
What should be the right
approach?
One way is to load the model using
FileManager and iterate through the
statements and map them accordingly to
the named model (i.e. our
mapped
model) and at the end close it. This
will work, but it will load all
of
the
triples in memory. Is this the right way
to proceed or is there a way
to
read the model sequentially at the time
of mapping?
Just trying to understand the efficient
way to map large set of
N-Triples.
Need your suggestions.
Thanks,
Anuj