Re: [MarkLogic Dev General] Slow lexicon performance for limited user

Michael Blakeley Tue, 26 Mar 2013 15:54:49 -0700

I'm not sure exactly why this is happening. The document permissions are stored 
and indexed for the documents, separate from the range index values. So the 
permissions check has to look at the universal index to check those, and join 
that against the matching range index values. So if the document permissions 
part returned the entire database, more or less, then the join might be O(n) 
with the database size. An admin user would skip the document permissions 
check, so the index join would not be necessary.


But that seems odd because I'd expect the document permissions lookup to use 
the range index QName too. So if the range index data is only in a small subset 
of the database, that permissions lookup shouldn't match the rest of the 
database at all. And that lookup should be O(log n) with database size, not 
O(n).

If that's what is happening, it makes sense that an amp would work around it. 
But I think you should open a support case too. This might be a bug, or at 
least a good area for optimization work.

-- Mike
 
On 26 Mar 2013, at 12:57 , Will Thompson <[email protected]> wrote:

> That's correct. The values are from a range index that stores only
> autocomplete data, which is only a fraction (~12MB) of the total test
> database (~1GB). 
> 
> count(cts:element-values(xs:QName("element"))) => 97676
> 
> So very similar to your 100k document test. We ran this with all the
> values stored in a single document and from a document for each value, and
> the test results were the same. However, our test user role only has
> "read" not "update" permission on the documents.
> 
> -Will
> 
> 
> On 3/26/13 12:19 PM, "Michael Blakeley" <[email protected]> wrote:
> 
>> So your database size is *not* proportional to the number of values?
>> 
>> How many values do you have?
>> 
>> -- Mike
>> 
>> On 26 Mar 2013, at 11:27 , Will Thompson <[email protected]>
>> wrote:
>> 
>>> After further testing it appears the latency increase for non-admin
>>> seems
>>> to be proportional to database size (note NOT range index size). We
>>> bootstrapped a fresh database, and with only some test documents loaded
>>> the query speeds were virtually identical. However, as content is loaded
>>> into the db, query time for the non-admin user essentially doubles every
>>> time the database size doubles.
>>> 
>>> -Will
>>> 
>>> 
>>> On 3/25/13 5:12 PM, "Michael Blakeley" <[email protected]> wrote:
>>> 
>>>> For reference, here's what I tried. I only created 100,000 documents
>>>> and
>>>> they are very small.
>>>> 
>>>> (: setup :)
>>>> (1 to 100 * 1000) ! (
>>>> xdmp:document-insert(
>>>>  '/test/'||.,
>>>>  element test {
>>>>    attribute id { . },
>>>>    element a { xdmp:integer-to-hex(xdmp:random()) } },
>>>>  ('read', 'update') ! xdmp:permission('test', .))),
>>>> xdmp:elapsed-time()
>>>> 
>>>> That takes about 30-sec on my laptop.
>>>> 
>>>> (: test - admin :)
>>>> (for $i in 1 to 1000
>>>> return cts:element-value-match(xs:QName("a"), "e*",
>>>> "limit=1"))[last()],
>>>> xdmp:elapsed-time()
>>>> =>
>>>> e0003170ed3130a4
>>>> PT0.084827S
>>>> 
>>>> (: test - user :)
>>>> xdmp:eval('
>>>> (for $i in 1 to 1000
>>>> return cts:element-value-match(
>>>>   xs:QName("a"), "e*", "limit=1"))[last()]',
>>>> (),
>>>> <options xmlns="xdmp:eval">
>>>>  <user-id>test</user-id>
>>>> </options>),
>>>> xdmp:elapsed-time()
>>>> =>
>>>> e0003170ed3130a4
>>>> PT0.07878S
>>>> 
>>>> In this particular run the non-admin user was faster - but that is
>>>> probably a caching effect, and anyway the difference was not
>>>> significant.
>>>> I'm using 6.0-2.1 on OS X, running the queries in cq.
>>>> 
>>>> There are about 6700 values that match 'e*'. According to the profiler,
>>>> about 50% of the elapsed time is spent in the cts:element-value-match
>>>> call. The rest is split between the FLWOR and the predicate on last().
>>>> 
>>>> -- Mike
>>>> 
>>>> On 25 Mar 2013, at 15:15 , Will Thompson <[email protected]>
>>>> wrote:
>>>> 
>>>>> I've tested this on on 6.0-2 (OSX) and 6.0-2.2 (Windows), and both
>>>>> have
>>>>> the same issue. The xdmp:plan output is the same under both users.
>>>>> Maybe I
>>>>> should try creating a more isolated test case...
>>>>> 
>>>>> -Will
>>>>> 
>>>>> 
>>>>> On 3/25/13 2:53 PM, "Michael Blakeley" <[email protected]> wrote:
>>>>> 
>>>>>> An amp shouldn't really be necessary, but it's puzzling that you see
>>>>>> such
>>>>>> a large difference. I tried to set up a similar test with some data I
>>>>>> had
>>>>>> handy, and saw a difference of less than 5% between admin and
>>>>>> non-admin
>>>>>> users.
>>>>>> 
>>>>>> Which release are you using?
>>>>>> 
>>>>>> -- Mike
>>>>>> 
>>>>>> On 25 Mar 2013, at 14:20 , Will Thompson <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Mike - It seems to ignore the query-trace (inside or outside eval),
>>>>>>> but
>>>>>>> I
>>>>>>> suspect you're right. Unfortunately this is dramatic enough to be
>>>>>>> the
>>>>>>> difference between a usable and unusable autocomplete solution, in
>>>>>>> which
>>>>>>> we're squeezing as much as we can into a limited time budget. We
>>>>>>> will
>>>>>>> need
>>>>>>> to run the query as case- and diacritic-insensitive.
>>>>>>> 
>>>>>>> Will I need to amp this operation to run under the admin role to be
>>>>>>> be
>>>>>>> performant?
>>>>>>> 
>>>>>>> -W
>>>>>>> 
>>>>>>> 
>>>>>>> On 3/25/13 1:54 PM, "Michael Blakeley" <[email protected]> wrote:
>>>>>>> 
>>>>>>>> The non-admin user should be checking extra query terms, to enforce
>>>>>>>> the
>>>>>>>> read permissions it has through its roles. That might be enough to
>>>>>>>> explain the difference. I think the extra terms will show up in an
>>>>>>>> xdmp:query-trace, if you want to verify that.
>>>>>>>> 
>>>>>>>> You might also try the 'diacritic-sensitive' and 'case-sensitive'
>>>>>>>> options. That should speed up the value-matching a bit.
>>>>>>>> 
>>>>>>>> -- Mike
>>>>>>>> 
>>>>>>>> On 25 Mar 2013, at 13:40 , Will Thompson
>>>>>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I ran this in a loop 100 times for the limited user and for admin,
>>>>>>>>> and
>>>>>>>>> the limited user was  roughly 50X slower than admin:
>>>>>>>>> 
>>>>>>>>> xdmp:eval(concat(
>>>>>>>>> 'xquery version "1.0-ml";',
>>>>>>>>> 'cts:element-value-match(xs:QName("element"), "value*",
>>>>>>>>> "limit=1")'),
>>>>>>>>> (),
>>>>>>>>> <options xmlns="xdmp:eval">
>>>>>>>>> <user-id>{ xdmp:user("limited") }</user-id>
>>>>>>>>> </options>)
>>>>>>>>> 
>>>>>>>>> The limited user has a role with read permissions on the documents
>>>>>>>>> containing those values (obviously, since it returns non-empty
>>>>>>>>> results),
>>>>>>>>> and also has the app-user role. Otherwise, this user has no other
>>>>>>>>> roles.
>>>>>>>>> With log level = debug, nothing really jumps out at me. I only see
>>>>>>>>> occasional "InMemoryStand", "OnDiskStand", and "Saving" messages,
>>>>>>>>> and
>>>>>>>>> they appear regardless of the user running the query.
>>>>>>>>> 
>>>>>>>>> -Will
>>>>>>>>> _______________________________________________
>>>>>>>>> General mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> General mailing list
>>>>>>>> [email protected]
>>>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> General mailing list
>>>>>>> [email protected]
>>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Slow lexicon performance for limited user

Reply via email to