[ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549975#comment-13549975 ]
Michael McCandless commented on LUCENE-4620: -------------------------------------------- Trunk: {noformat} [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, length of: 2430) 41152 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 18.4955 4430 44.3003 1162 11.6201 [java] Sorting (Unique (VInt8)) 18.4955 4344 43.4403 1105 11.0501 [java] Sorting (Unique (DGap (VInt8))) 8.5597 4481 44.8103 842 8.4201 [java] Sorting (Unique (DGap (EightFlags (VInt8)))) 4.9679 4636 46.3603 1021 10.2101 [java] Sorting (Unique (DGap (FourFlags (VInt8)))) 4.8198 4515 45.1503 1001 10.0101 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 4.5794 4904 49.0403 1056 10.5601 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 4.5794 4751 47.5103 1035 10.3501 [java] [java] [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, length of: 1489) 67159 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 18.2673 1241 12.4100 1128 11.2800 [java] Sorting (Unique (VInt8)) 18.2673 3488 34.8801 924 9.2400 [java] Sorting (Unique (DGap (VInt8))) 8.9456 3061 30.6101 660 6.6000 [java] Sorting (Unique (DGap (EightFlags (VInt8)))) 5.7542 3693 36.9301 1026 10.2600 [java] Sorting (Unique (DGap (FourFlags (VInt8)))) 5.5447 3462 34.6201 811 8.1100 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 5.3566 3846 38.4601 1018 10.1800 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 5.3996 3879 38.7901 1025 10.2500 [java] [java] [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 10000 (unsorted, length of: 18) 5555555 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 20.8889 1179 11.7900 1114 11.1400 [java] Sorting (Unique (VInt8)) 20.8889 2251 22.5100 1171 11.7100 [java] Sorting (Unique (DGap (VInt8))) 12.0000 2174 21.7400 848 8.4800 [java] Sorting (Unique (DGap (EightFlags (VInt8)))) 10.2222 2372 23.7200 1092 10.9200 [java] Sorting (Unique (DGap (FourFlags (VInt8)))) 10.2222 2355 23.5500 1062 10.6200 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 9.7778 2414 24.1400 1085 10.8500 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 10.2222 2492 24.9200 1130 11.3000 [java] [java] [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 501871 (unsorted, length of: 957) 104493 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 16.5768 998 9.9800 896 8.9600 [java] Sorting (Unique (VInt8)) 16.5768 2542 25.4201 864 8.6400 [java] Sorting (Unique (DGap (VInt8))) 8.4848 2468 24.6800 646 6.4600 [java] Sorting (Unique (DGap (EightFlags (VInt8)))) 4.4138 2526 25.2601 768 7.6800 [java] Sorting (Unique (DGap (FourFlags (VInt8)))) 4.1797 2406 24.0600 696 6.9600 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8))))) 3.8955 2541 25.4101 802 8.0200 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8))))) 3.8871 2537 25.3701 770 7.7000 [java] {noformat} Patch: {noformat} [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, length of: 2430) 41152 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 18.4955 594 5.9400 419 4.1900 [java] Sorting (Unique (VInt8)) 18.4955 3147 31.4702 579 5.7900 [java] Sorting (Unique (DGap (VInt8))) 8.5597 3167 31.6702 278 2.7800 [java] Sorting (Unique (DGap (EightFlags (VInt)))) 4.9679 3624 36.2402 401 4.0100 [java] Sorting (Unique (DGap (FourFlags (VInt)))) 4.8198 3534 35.3402 379 3.7900 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt))))) 4.5794 3954 39.5403 580 5.8000 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt))))) 4.5794 3947 39.4703 595 5.9500 [java] [java] [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, length of: 1489) 67159 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 18.2673 592 5.9200 441 4.4100 [java] Sorting (Unique (VInt8)) 18.2673 2002 20.0200 443 4.4300 [java] Sorting (Unique (DGap (VInt8))) 8.9456 2077 20.7701 301 3.0100 [java] Sorting (Unique (DGap (EightFlags (VInt)))) 5.7542 2646 26.4601 419 4.1900 [java] Sorting (Unique (DGap (FourFlags (VInt)))) 5.5447 2505 25.0501 375 3.7500 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt))))) 5.3566 2984 29.8401 625 6.2500 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt))))) 5.3996 2997 29.9701 616 6.1600 [java] [java] [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 10000 (unsorted, length of: 18) 5555555 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 20.8889 585 5.8500 585 5.8500 [java] Sorting (Unique (VInt8)) 20.8889 1127 11.2700 588 5.8800 [java] Sorting (Unique (DGap (VInt8))) 12.0000 1156 11.5600 477 4.7700 [java] Sorting (Unique (DGap (EightFlags (VInt)))) 10.2222 1346 13.4600 657 6.5700 [java] Sorting (Unique (DGap (FourFlags (VInt)))) 10.2222 1385 13.8500 573 5.7300 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt))))) 9.7778 1565 15.6500 845 8.4500 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt))))) 10.2222 1662 16.6200 891 8.9100 [java] [java] [java] Estimating ~100000000 Integers compression time by [java] Encoding/decoding facets' ID payload of docID = 501871 (unsorted, length of: 957) 104493 times. [java] [java] Encoder Bits/Int Encode Time Encode Time Decode Time Decode Time [java] [milliseconds] [microsecond / int] [milliseconds] [microsecond / int] [java] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- [java] VInt8 16.5768 446 4.4600 439 4.3900 [java] Sorting (Unique (VInt8)) 16.5768 1429 14.2900 420 4.2000 [java] Sorting (Unique (DGap (VInt8))) 8.4848 1390 13.9000 298 2.9800 [java] Sorting (Unique (DGap (EightFlags (VInt)))) 4.4138 1457 14.5700 387 3.8700 [java] Sorting (Unique (DGap (FourFlags (VInt)))) 4.1797 1529 15.2900 368 3.6800 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt))))) 3.8955 1829 18.2900 530 5.3000 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt))))) 3.8871 1842 18.4200 528 5.2800 [java] {noformat} Looks like ~2-3X faster... good! > Explore IntEncoder/Decoder bulk API > ----------------------------------- > > Key: LUCENE-4620 > URL: https://issues.apache.org/jira/browse/LUCENE-4620 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Shai Erera > Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch > > > Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) > and decode(int). Originally, we believed that this layer can be useful for > other scenarios, but in practice it's used only for writing/reading the > category ordinals from payload/DV. > Therefore, Mike and I would like to explore a bulk API, something like > encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder > can still be streaming (as we don't know in advance how many ints will be > written), dunno. Will figure this out as we go. > One thing to check is whether the bulk API can work w/ e.g. facet > associations, which can write arbitrary byte[], and so may decoding to an > IntsRef won't make sense. This too we'll figure out as we go. I don't rule > out that associations will use a different bulk API. > At the end of the day, the requirement is for someone to be able to configure > how ordinals are written (i.e. different encoding schemes: VInt, PackedInts > etc.) and later read, with as little overhead as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org