Hi Amindri, You can ignore those WARNINGS. They simple tell you that a literal value failed to validate the stated data type. I am not completely sure what jena does with such triples. But I think that it does store them anyway in the triplestore. When I imported freebase I piped the lodgings to a grep that removed all lines containing "WARN jena.riot".
You will also see some similar warnings during indexing (e.g. dates like the 31th February ...). During indexing those data are stored as string values. On Mon, Feb 23, 2015 at 3:06 AM, Amindri Udugala <amindriudug...@gmail.com> wrote: > However I noticed that the indexing process uses up to 14 GB of ram and > very little cpu (0% - 1%. Mostly it is 0%). Also does not seem to use any > disk space at all. Is this something to be worried about? Jena TDB uses memory mapped files. AFAIK it will use all memory it can get for those. CPU is expected to be minimal. Most of the time is spent in index lookups. For every triple Jena needs to lookup the subject, predicate and object in the nodes table. After that it needs to lookup the triple in the triple table. In case any node or the triple does not exist it needs to update the tables. So most of the time is spent in lookups and write operations. As soon the the table get to big to be mapped in memory things start to get slow. Depending on the hardware even very slow .... The WARN messages state the line number. When you do a line count on the source file you can easily determine how much of the dump you have already imported. You should also see loggings about the current import speed. Combining this you can estimate the remaining time. best Rupert > > Thanks > Amindri > > > > On 13 February 2015 at 17:16, Amindri Udugala <amindriudug...@gmail.com> > wrote: > >> Hi Rupert, >> >> The fix is in the indexing tool. >> (entityhub/indexing/core/source/LineBasedEntityIterator.java). I created >> the issue and submitted the patch. >> >> Yes Rupert, the problem was jena TDB is not importing the, Freebase dump. >> The reason behind this was file name of my freebase data dump. It was named >> as freebase_latest.gz, and JenaTDB was trying to map the extension of the >> file with a map of Lang objects. (Check line no 61 in RdfResourceImporter). >> Once I renamed my Freebase dump as freebase.rdf.gz, Jena TDB started to >> import the data. >> >> Then again it threw a riot exception and now I'm running the fixit.pl >> tool on the dump. Will keep you updated on how the indexing process will >> turn out. >> >> Thanks for the valuable tips on indexing. >> >> Thanks >> Amindri >> >> > > > -- > Regards > Amindri Udugala -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/