thanks Andy, I think it confirms our current understanding.
From: Andy Seaborne <[email protected]> To: [email protected], Date: 02/02/2013 08:44 AM Subject: Re: TDB and length limits for string objects? On 31/01/13 14:58, Simon Helsen wrote: > Hi guys, > > I have a generic question about having large strings as objects in > triples. It is not entirely clear to me what the ramifications are if TDB > indexes triples with very large objects (typically of some string type). > We currently have an internal discussion about this because it seems that > in the past we essentially blocked triples with a very large string object > to end up in TDB in the first place (right now, the artificial limit is a > string length of 1024). In most cases, clients would not put any fancy > filters on such large strings in their sparql query, but they would still > want to retrieve the large string object. Still, even in this use-case, it > is not clear how this would negatively affect the performance both in > terms of memory and cpu. TDB has no internal limits on literal lexical form length. It does not affect indexing (indexes are on NodeId - fixed length 8 bytes). It does affect loading (more bytes!) and the total system resources (beware of OOME). If 1K+ literals form a significant amount of the database, it will be slower as the cumulation of all the costs. But the "not in most cases" case can be very bad - searching them by regex is expensive as I bet that may be "find all such that regex". That is expensive. Consider using an additional index, LARQ style. If the clients really are just storing large literals in RDF, not searching, then putting an indirection and keeping them in a KeyValue blob store You will need to try to know the exact impact in your usage 1024 is not really that large. Andy
