Re: TDB and length limits for string objects?

Andy Seaborne Sat, 02 Feb 2013 05:44:40 -0800

On 31/01/13 14:58, Simon Helsen wrote:

Hi guys,


I have a generic question about having large strings as objects in
triples. It is not entirely clear to me what the ramifications are if TDB
indexes triples with very large objects (typically of some string type).
We currently have an internal discussion about this because it seems that
in the past we essentially blocked triples with a very large string object
to end up in TDB in the first place (right now, the artificial limit is a
string length of 1024). In most cases, clients would not put any fancy
filters on such large strings in their sparql query, but they would still
want to retrieve the large string object. Still, even in this use-case, it
is not clear how this would negatively affect the performance both in
terms of memory and cpu.

TDB has no internal limits on literal lexical form length. It does notaffect indexing (indexes are on NodeId - fixed length 8 bytes). It doesaffect loading (more bytes!) and the total system resources (beware ofOOME). If 1K+ literals form a significant amount of the database, itwill be slower as the cumulation of all the costs.

But the "not in most cases" case can be very bad - searching them byregex is expensive as I bet that may be "find all such that regex".That is expensive. Consider using an additional index, LARQ style.

If the clients really are just storing large literals in RDF, notsearching, then putting an indirection and keeping them in a KeyValueblob store

You will need to try to know the exact impact in your usage 1024 is notreally that large.


        Andy

Re: TDB and length limits for string objects?

Reply via email to