[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549975#comment-13549975
 ] 

Michael McCandless commented on LUCENE-4620:
--------------------------------------------

Trunk:
{noformat}
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, 
length of: 2430) 41152 times.
     [java] 
     [java] Encoder                Bits/Int          Encode Time                
Encode Time          Decode Time                Decode Time
     [java]                                       [milliseconds]        
[microsecond / int]       [milliseconds]        [microsecond / int]
     [java] 
-------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                   18.4955                 4430                
    44.3003                 1162                    11.6201
     [java] Sorting (Unique (VInt8))    18.4955                 4344            
        43.4403                 1105                    11.0501
     [java] Sorting (Unique (DGap (VInt8)))     8.5597                 4481     
               44.8103                  842                     8.4201
     [java] Sorting (Unique (DGap (EightFlags (VInt8))))     4.9679             
    4636                    46.3603                 1021                    
10.2101
     [java] Sorting (Unique (DGap (FourFlags (VInt8))))     4.8198              
   4515                    45.1503                 1001                    
10.0101
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8)))))     4.5794  
               4904                    49.0403                 1056             
       10.5601
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8)))))     4.5794  
               4751                    47.5103                 1035             
       10.3501
     [java] 
     [java] 
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, 
length of: 1489) 67159 times.
     [java] 
     [java] Encoder                Bits/Int          Encode Time                
Encode Time          Decode Time                Decode Time
     [java]                                       [milliseconds]        
[microsecond / int]       [milliseconds]        [microsecond / int]
     [java] 
-------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                   18.2673                 1241                
    12.4100                 1128                    11.2800
     [java] Sorting (Unique (VInt8))    18.2673                 3488            
        34.8801                  924                     9.2400
     [java] Sorting (Unique (DGap (VInt8)))     8.9456                 3061     
               30.6101                  660                     6.6000
     [java] Sorting (Unique (DGap (EightFlags (VInt8))))     5.7542             
    3693                    36.9301                 1026                    
10.2600
     [java] Sorting (Unique (DGap (FourFlags (VInt8))))     5.5447              
   3462                    34.6201                  811                     
8.1100
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8)))))     5.3566  
               3846                    38.4601                 1018             
       10.1800
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8)))))     5.3996  
               3879                    38.7901                 1025             
       10.2500
     [java] 
     [java] 
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 10000 (unsorted, 
length of: 18) 5555555 times.
     [java] 
     [java] Encoder                Bits/Int          Encode Time                
Encode Time          Decode Time                Decode Time
     [java]                                       [milliseconds]        
[microsecond / int]       [milliseconds]        [microsecond / int]
     [java] 
-------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                   20.8889                 1179                
    11.7900                 1114                    11.1400
     [java] Sorting (Unique (VInt8))    20.8889                 2251            
        22.5100                 1171                    11.7100
     [java] Sorting (Unique (DGap (VInt8)))    12.0000                 2174     
               21.7400                  848                     8.4800
     [java] Sorting (Unique (DGap (EightFlags (VInt8))))    10.2222             
    2372                    23.7200                 1092                    
10.9200
     [java] Sorting (Unique (DGap (FourFlags (VInt8))))    10.2222              
   2355                    23.5500                 1062                    
10.6200
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8)))))     9.7778  
               2414                    24.1400                 1085             
       10.8500
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8)))))    10.2222  
               2492                    24.9200                 1130             
       11.3000
     [java] 
     [java] 
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 501871 (unsorted, 
length of: 957) 104493 times.
     [java] 
     [java] Encoder                Bits/Int          Encode Time                
Encode Time          Decode Time                Decode Time
     [java]                                       [milliseconds]        
[microsecond / int]       [milliseconds]        [microsecond / int]
     [java] 
-------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                   16.5768                  998                
     9.9800                  896                     8.9600
     [java] Sorting (Unique (VInt8))    16.5768                 2542            
        25.4201                  864                     8.6400
     [java] Sorting (Unique (DGap (VInt8)))     8.4848                 2468     
               24.6800                  646                     6.4600
     [java] Sorting (Unique (DGap (EightFlags (VInt8))))     4.4138             
    2526                    25.2601                  768                     
7.6800
     [java] Sorting (Unique (DGap (FourFlags (VInt8))))     4.1797              
   2406                    24.0600                  696                     
6.9600
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8)))))     3.8955  
               2541                    25.4101                  802             
        8.0200
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8)))))     3.8871  
               2537                    25.3701                  770             
        7.7000
     [java] 
{noformat}

Patch:

{noformat}
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, 
length of: 2430) 41152 times.
     [java] 
     [java] Encoder                                                        
Bits/Int          Encode Time                Encode Time          Decode Time   
             Decode Time
     [java]                                                                     
          [milliseconds]        [microsecond / int]       [milliseconds]        
[microsecond / int]
     [java] 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                                                           
18.4955                  594                     5.9400                  419    
                 4.1900
     [java] Sorting (Unique (VInt8))                                        
18.4955                 3147                    31.4702                  579    
                 5.7900
     [java] Sorting (Unique (DGap (VInt8)))                                  
8.5597                 3167                    31.6702                  278     
                2.7800
     [java] Sorting (Unique (DGap (EightFlags (VInt))))                      
4.9679                 3624                    36.2402                  401     
                4.0100
     [java] Sorting (Unique (DGap (FourFlags (VInt))))                       
4.8198                 3534                    35.3402                  379     
                3.7900
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))           
4.5794                 3954                    39.5403                  580     
                5.8000
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))           
4.5794                 3947                    39.4703                  595     
                5.9500
     [java] 
     [java] 
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, 
length of: 1489) 67159 times.
     [java] 
     [java] Encoder                                                        
Bits/Int          Encode Time                Encode Time          Decode Time   
             Decode Time
     [java]                                                                     
          [milliseconds]        [microsecond / int]       [milliseconds]        
[microsecond / int]
     [java] 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                                                           
18.2673                  592                     5.9200                  441    
                 4.4100
     [java] Sorting (Unique (VInt8))                                        
18.2673                 2002                    20.0200                  443    
                 4.4300
     [java] Sorting (Unique (DGap (VInt8)))                                  
8.9456                 2077                    20.7701                  301     
                3.0100
     [java] Sorting (Unique (DGap (EightFlags (VInt))))                      
5.7542                 2646                    26.4601                  419     
                4.1900
     [java] Sorting (Unique (DGap (FourFlags (VInt))))                       
5.5447                 2505                    25.0501                  375     
                3.7500
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))           
5.3566                 2984                    29.8401                  625     
                6.2500
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))           
5.3996                 2997                    29.9701                  616     
                6.1600
     [java] 
     [java] 
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 10000 (unsorted, 
length of: 18) 5555555 times.
     [java] 
     [java] Encoder                                                        
Bits/Int          Encode Time                Encode Time          Decode Time   
             Decode Time
     [java]                                                                     
          [milliseconds]        [microsecond / int]       [milliseconds]        
[microsecond / int]
     [java] 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                                                           
20.8889                  585                     5.8500                  585    
                 5.8500
     [java] Sorting (Unique (VInt8))                                        
20.8889                 1127                    11.2700                  588    
                 5.8800
     [java] Sorting (Unique (DGap (VInt8)))                                 
12.0000                 1156                    11.5600                  477    
                 4.7700
     [java] Sorting (Unique (DGap (EightFlags (VInt))))                     
10.2222                 1346                    13.4600                  657    
                 6.5700
     [java] Sorting (Unique (DGap (FourFlags (VInt))))                      
10.2222                 1385                    13.8500                  573    
                 5.7300
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))           
9.7778                 1565                    15.6500                  845     
                8.4500
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))          
10.2222                 1662                    16.6200                  891    
                 8.9100
     [java] 
     [java] 
     [java] Estimating ~100000000 Integers compression time by
     [java] Encoding/decoding facets' ID payload of docID = 501871 (unsorted, 
length of: 957) 104493 times.
     [java] 
     [java] Encoder                                                        
Bits/Int          Encode Time                Encode Time          Decode Time   
             Decode Time
     [java]                                                                     
          [milliseconds]        [microsecond / int]       [milliseconds]        
[microsecond / int]
     [java] 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     [java] VInt8                                                           
16.5768                  446                     4.4600                  439    
                 4.3900
     [java] Sorting (Unique (VInt8))                                        
16.5768                 1429                    14.2900                  420    
                 4.2000
     [java] Sorting (Unique (DGap (VInt8)))                                  
8.4848                 1390                    13.9000                  298     
                2.9800
     [java] Sorting (Unique (DGap (EightFlags (VInt))))                      
4.4138                 1457                    14.5700                  387     
                3.8700
     [java] Sorting (Unique (DGap (FourFlags (VInt))))                       
4.1797                 1529                    15.2900                  368     
                3.6800
     [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt)))))           
3.8955                 1829                    18.2900                  530     
                5.3000
     [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt)))))           
3.8871                 1842                    18.4200                  528     
                5.2800
     [java] 
{noformat}

Looks like ~2-3X faster... good!

                
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
>                 Key: LUCENE-4620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>         Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to