Re: TDB and length limits for string objects?

Simon Helsen Mon, 04 Feb 2013 07:14:23 -0800

thanks Andy,

I think it confirms our current understanding.

From:
Andy Seaborne <[email protected]>
To:
[email protected], 
Date:
02/02/2013 08:44 AM
Subject:
Re: TDB and length limits for string objects?

On 31/01/13 14:58, Simon Helsen wrote:
> Hi guys,
>
> I have a generic question about having large strings as objects in
> triples. It is not entirely clear to me what the ramifications are if 
TDB
> indexes triples with very large objects (typically of some string type).
> We currently have an internal discussion about this because it seems 
that
> in the past we essentially blocked triples with a very large string 
object
> to end up in TDB in the first place (right now, the artificial limit is 
a
> string length of 1024). In most cases, clients would not put any fancy
> filters on such large strings in their sparql query, but they would 
still
> want to retrieve the large string object. Still, even in this use-case, 
it
> is not clear how this would negatively affect the performance both in
> terms of memory and cpu.

TDB has no internal limits on literal lexical form length. It does not 
affect indexing (indexes are on NodeId - fixed length 8 bytes).  It does 
affect loading (more bytes!) and the total system resources (beware of 
OOME).  If 1K+ literals form a significant amount of the database, it 
will be slower as the cumulation of all the costs.

But the "not in most cases" case can be very bad - searching them by 
regex is expensive as I bet that may be "find all such that regex". 
That is expensive.  Consider using an additional index, LARQ style.

If the clients really are just storing large literals in RDF, not 
searching, then putting an indirection and keeping them in a KeyValue 
blob store

You will need to try to know the exact impact in your usage 1024 is not 
really that large.

                 Andy

Re: TDB and length limits for string objects?

Reply via email to