Re: [arangodb-google] Simple query, pretty slow

Wilfried Gösgens Fri, 14 Jun 2019 04:05:04 -0700

Hi, 
Can you share a set of sample documents? How well is the distribution on 
`_type` ? Which samples are there? 
On Friday, June 14, 2019 at 11:22:51 AM UTC+2, Andreas Jung wrote:
>
> Recreating the indexes after import does not make a difference.
>
> Returning doc._path  for 20.000 items takes 50 ms, returning doc._path 
> takes minutes
>
> The _path index is deduplicated, the _type index is not 
>
> The only difference in the execution plans is "index only" when "RETURN 
> doc._type". Since both _type and _path
> are fully indexed I would assume that the query is executed in both times 
> based on index data.
>
> So ArangoDB will load all 100.000 objects for picking up the value of 
> _path? The overall data is meanwhile 55 GB
> (about one third of the data is binary data (files and images base64 
> encoded).  
>
> This is all no big problem for me since we perform such queries once 
> before a migration run and it does matter taking
> a migration running for some hours a minutes more or less but I want to 
> understand what is going on here (in particular
> this is unexpected behavior).
>
>
> Query String:
>  for doc in import 
>  filter doc._type == 'Image'
>  return doc._type
>
> Execution plan:
>  Id   NodeType          Est.   Comment
>   1   SingletonNode        1   * ROOT
>   7   IndexNode         2214     - FOR doc IN import   /* hash index scan, 
> index only, projections: `_type` */
>   5   CalculationNode   2214       - LET #3 = doc.`_type`   /* attribute 
> expression */   /* collections used: doc : import */
>   6   ReturnNode        2214       - RETURN #3
>
> Indexes used:
>  By   Type   Collection   Unique   Sparse   Selectivity   Fields        
> Ranges
>   7   hash   import       false    false         0.05 %   [ `_type` ]  
>  (doc.`_type` == "Image")
>
> Optimization rules applied:
>  Id   RuleName
>   1   move-calculations-up
>   2   move-filters-up
>   3   move-calculations-up-2
>   4   move-filters-up-2
>   5   use-indexes
>   6   remove-filter-covered-by-index
>   7   remove-unnecessary-calculations-2
>   8   reduce-extraction-to-projection
>
>
>
> Query String:
>  for doc in import 
>  filter doc._type == 'Image'
>  return doc._path
>
> Execution plan:
>  Id   NodeType          Est.   Comment
>   1   SingletonNode        1   * ROOT
>   7   IndexNode         2214     - FOR doc IN import   /* hash index scan, 
> projections: `_path` */
>   5   CalculationNode   2214       - LET #3 = doc.`_path`   /* attribute 
> expression */   /* collections used: doc : import */
>   6   ReturnNode        2214       - RETURN #3
>
> Indexes used:
>  By   Type   Collection   Unique   Sparse   Selectivity   Fields        
> Ranges
>   7   hash   import       false    false         0.05 %   [ `_type` ]  
>  (doc.`_type` == "Image")
>
> Optimization rules applied:
>  Id   RuleName
>   1   move-calculations-up
>   2   move-filters-up
>   3   move-calculations-up-2
>   4   move-filters-up-2
>   5   use-indexes
>   6   remove-filter-covered-by-index
>   7   remove-unnecessary-calculations-2
>   8   reduce-extraction-to-projection
>
>
>
>
> On Friday, June 14, 2019 at 9:54:10 AM UTC+2, Andreas Jung wrote:
>>
>> Using RocksDB (default installation).
>>
>> I create a new collection for every import of the data including the 
>> indexes.
>>
>> Unfortunately I don't have the key names in my hands. They are coming
>> from a JSON dump of a CMS.
>>
>> Am Freitag, 14. Juni 2019 09:50:41 UTC+2 schrieb Wilfried Gösgens:
>>>
>>> Hi, 
>>> afair you're using rocksdb?
>>>
>>> can you try to re-create that index to be on `_type`, `_path`, `_key` 
>>> for better using of projections?
>>>
>>> Please note that you shouldn't use fieldnames starting with `_` since 
>>> they're defined as system specific fields in arangodb.
>>>
>>> Cheers, 
>>> Willi
>>>
>>> On Friday, June 14, 2019 at 9:41:24 AM UTC+2, Andreas Jung wrote:
>>>>
>>>> _key is a UUID4
>>>> _path is standard filesystem path not longer than 100 chars each
>>>>
>>>> That can not be the problem.
>>>>
>>>> Am Freitag, 14. Juni 2019 09:36:17 UTC+2 schrieb James Courtier-Dutton:
>>>>>
>>>>> Hi,
>>>>>
>>>>> What is the average size of the returned data? It could just be the 
>>>>> time it takes to serialise the data being returned
>>>>>
>>>>> James
>>>>>
>>>>> On Fri, 14 Jun 2019, 05:45 'Andreas Jung' via ArangoDB, <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> this query 
>>>>>>
>>>>>>  for doc in import 
>>>>>>    filter doc._type == 'Image'
>>>>>>    return {path: doc._path, key: doc._key}
>>>>>>
>>>>>> takes about 45 seconds on decent hardware with an import collection 
>>>>>> of about 100.000 items with about 21.000 of _type = 'Image'.
>>>>>> There is an index of _type. Using PyArango as client...I really 
>>>>>> wander why this query is running so slow?!
>>>>>>
>>>>>> Running ArangoDB 3.4.3
>>>>>>
>>>>>> Profile
>>>>>>
>>>>>> Query String:
>>>>>>  for doc in import 
>>>>>>  filter doc._type == 'Image'
>>>>>>  return {path: doc._path, key: doc._key}
>>>>>>
>>>>>> Execution plan:
>>>>>>  Id   NodeType          Calls   Items   Runtime [s]   Comment
>>>>>>   1   SingletonNode         1       1       0.00000   * ROOT
>>>>>>   7   IndexNode            21   20617      32.73956     - FOR doc IN 
>>>>>> import   /* hash index scan, projections: `_key`, `_path` */
>>>>>>   5   CalculationNode      21   20617       0.04354       - LET #3 = 
>>>>>> { "path" : doc.`_path`, "key" : doc.`_key` }   /* simple expression */   
>>>>>> /* 
>>>>>> collections used: doc : import */
>>>>>>   6   ReturnNode           21   20617       0.00016       - RETURN #3
>>>>>>
>>>>>> Indexes used:
>>>>>>  By   Type   Collection   Unique   Sparse   Selectivity   Fields      
>>>>>>   Ranges
>>>>>>   7   hash   import       false    false         0.05 %   [ `_type` 
>>>>>> ]   (doc.`_type` == "Image")
>>>>>>
>>>>>> Optimization rules applied:
>>>>>>  Id   RuleName
>>>>>>   1   move-calculations-up
>>>>>>   2   move-filters-up
>>>>>>   3   move-calculations-up-2
>>>>>>   4   move-filters-up-2
>>>>>>   5   use-indexes
>>>>>>   6   remove-filter-covered-by-index
>>>>>>   7   remove-unnecessary-calculations-2
>>>>>>   8   reduce-extraction-to-projection
>>>>>>
>>>>>> Query Statistics:
>>>>>>  Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Exec 
>>>>>> Time [s]
>>>>>>            0            0           0        20617          0        
>>>>>> 32.78928
>>>>>>
>>>>>> Query Profile:
>>>>>>  Query Stage           Duration [s]
>>>>>>  initializing               0.00001
>>>>>>  parsing                    0.00010
>>>>>>  optimizing ast             0.00001
>>>>>>  loading collections        0.00002
>>>>>>  instantiating plan         0.00005
>>>>>>  optimizing plan            0.00032
>>>>>>  executing                 32.78841
>>>>>>  finalizing                 0.00032
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "ArangoDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>


-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/arangodb/6c96c367-1391-446d-87bb-4083199cc6c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [arangodb-google] Simple query, pretty slow

Reply via email to