[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974215#comment-16974215
 ] 

Ignacio Vera commented on LUCENE-8997:
--------------------------------------

I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%|
{code}

> Add type of triangle info to ShapeField encoding
> ------------------------------------------------
>
>                 Key: LUCENE-8997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8997
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to