Hi, Can you share a set of sample documents? How well is the distribution on `_type` ? Which samples are there? On Friday, June 14, 2019 at 11:22:51 AM UTC+2, Andreas Jung wrote: > > Recreating the indexes after import does not make a difference. > > Returning doc._path for 20.000 items takes 50 ms, returning doc._path > takes minutes > > The _path index is deduplicated, the _type index is not > > The only difference in the execution plans is "index only" when "RETURN > doc._type". Since both _type and _path > are fully indexed I would assume that the query is executed in both times > based on index data. > > So ArangoDB will load all 100.000 objects for picking up the value of > _path? The overall data is meanwhile 55 GB > (about one third of the data is binary data (files and images base64 > encoded). > > This is all no big problem for me since we perform such queries once > before a migration run and it does matter taking > a migration running for some hours a minutes more or less but I want to > understand what is going on here (in particular > this is unexpected behavior). > > > Query String: > for doc in import > filter doc._type == 'Image' > return doc._type > > Execution plan: > Id NodeType Est. Comment > 1 SingletonNode 1 * ROOT > 7 IndexNode 2214 - FOR doc IN import /* hash index scan, > index only, projections: `_type` */ > 5 CalculationNode 2214 - LET #3 = doc.`_type` /* attribute > expression */ /* collections used: doc : import */ > 6 ReturnNode 2214 - RETURN #3 > > Indexes used: > By Type Collection Unique Sparse Selectivity Fields > Ranges > 7 hash import false false 0.05 % [ `_type` ] > (doc.`_type` == "Image") > > Optimization rules applied: > Id RuleName > 1 move-calculations-up > 2 move-filters-up > 3 move-calculations-up-2 > 4 move-filters-up-2 > 5 use-indexes > 6 remove-filter-covered-by-index > 7 remove-unnecessary-calculations-2 > 8 reduce-extraction-to-projection > > > > Query String: > for doc in import > filter doc._type == 'Image' > return doc._path > > Execution plan: > Id NodeType Est. Comment > 1 SingletonNode 1 * ROOT > 7 IndexNode 2214 - FOR doc IN import /* hash index scan, > projections: `_path` */ > 5 CalculationNode 2214 - LET #3 = doc.`_path` /* attribute > expression */ /* collections used: doc : import */ > 6 ReturnNode 2214 - RETURN #3 > > Indexes used: > By Type Collection Unique Sparse Selectivity Fields > Ranges > 7 hash import false false 0.05 % [ `_type` ] > (doc.`_type` == "Image") > > Optimization rules applied: > Id RuleName > 1 move-calculations-up > 2 move-filters-up > 3 move-calculations-up-2 > 4 move-filters-up-2 > 5 use-indexes > 6 remove-filter-covered-by-index > 7 remove-unnecessary-calculations-2 > 8 reduce-extraction-to-projection > > > > > On Friday, June 14, 2019 at 9:54:10 AM UTC+2, Andreas Jung wrote: >> >> Using RocksDB (default installation). >> >> I create a new collection for every import of the data including the >> indexes. >> >> Unfortunately I don't have the key names in my hands. They are coming >> from a JSON dump of a CMS. >> >> Am Freitag, 14. Juni 2019 09:50:41 UTC+2 schrieb Wilfried Gösgens: >>> >>> Hi, >>> afair you're using rocksdb? >>> >>> can you try to re-create that index to be on `_type`, `_path`, `_key` >>> for better using of projections? >>> >>> Please note that you shouldn't use fieldnames starting with `_` since >>> they're defined as system specific fields in arangodb. >>> >>> Cheers, >>> Willi >>> >>> On Friday, June 14, 2019 at 9:41:24 AM UTC+2, Andreas Jung wrote: >>>> >>>> _key is a UUID4 >>>> _path is standard filesystem path not longer than 100 chars each >>>> >>>> That can not be the problem. >>>> >>>> Am Freitag, 14. Juni 2019 09:36:17 UTC+2 schrieb James Courtier-Dutton: >>>>> >>>>> Hi, >>>>> >>>>> What is the average size of the returned data? It could just be the >>>>> time it takes to serialise the data being returned >>>>> >>>>> James >>>>> >>>>> On Fri, 14 Jun 2019, 05:45 'Andreas Jung' via ArangoDB, < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi there, >>>>>> >>>>>> this query >>>>>> >>>>>> for doc in import >>>>>> filter doc._type == 'Image' >>>>>> return {path: doc._path, key: doc._key} >>>>>> >>>>>> takes about 45 seconds on decent hardware with an import collection >>>>>> of about 100.000 items with about 21.000 of _type = 'Image'. >>>>>> There is an index of _type. Using PyArango as client...I really >>>>>> wander why this query is running so slow?! >>>>>> >>>>>> Running ArangoDB 3.4.3 >>>>>> >>>>>> Profile >>>>>> >>>>>> Query String: >>>>>> for doc in import >>>>>> filter doc._type == 'Image' >>>>>> return {path: doc._path, key: doc._key} >>>>>> >>>>>> Execution plan: >>>>>> Id NodeType Calls Items Runtime [s] Comment >>>>>> 1 SingletonNode 1 1 0.00000 * ROOT >>>>>> 7 IndexNode 21 20617 32.73956 - FOR doc IN >>>>>> import /* hash index scan, projections: `_key`, `_path` */ >>>>>> 5 CalculationNode 21 20617 0.04354 - LET #3 = >>>>>> { "path" : doc.`_path`, "key" : doc.`_key` } /* simple expression */ >>>>>> /* >>>>>> collections used: doc : import */ >>>>>> 6 ReturnNode 21 20617 0.00016 - RETURN #3 >>>>>> >>>>>> Indexes used: >>>>>> By Type Collection Unique Sparse Selectivity Fields >>>>>> Ranges >>>>>> 7 hash import false false 0.05 % [ `_type` >>>>>> ] (doc.`_type` == "Image") >>>>>> >>>>>> Optimization rules applied: >>>>>> Id RuleName >>>>>> 1 move-calculations-up >>>>>> 2 move-filters-up >>>>>> 3 move-calculations-up-2 >>>>>> 4 move-filters-up-2 >>>>>> 5 use-indexes >>>>>> 6 remove-filter-covered-by-index >>>>>> 7 remove-unnecessary-calculations-2 >>>>>> 8 reduce-extraction-to-projection >>>>>> >>>>>> Query Statistics: >>>>>> Writes Exec Writes Ign Scan Full Scan Index Filtered Exec >>>>>> Time [s] >>>>>> 0 0 0 20617 0 >>>>>> 32.78928 >>>>>> >>>>>> Query Profile: >>>>>> Query Stage Duration [s] >>>>>> initializing 0.00001 >>>>>> parsing 0.00010 >>>>>> optimizing ast 0.00001 >>>>>> loading collections 0.00002 >>>>>> instantiating plan 0.00005 >>>>>> optimizing plan 0.00032 >>>>>> executing 32.78841 >>>>>> finalizing 0.00032 >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "ArangoDB" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>
-- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/6c96c367-1391-446d-87bb-4083199cc6c9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
