[ https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974215#comment-16974215 ]
Ignacio Vera edited comment on LUCENE-8997 at 11/14/19 1:19 PM: ---------------------------------------------------------------- I would like to raise this issue again as I make a small improvement. I realise that for points I do not need to add the point information for data dimensions, therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that only contain points it means they will compress very well. I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of the index size of 30%! {code} ||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader heap (MB)|| ||Dev||Base||Diff ||Dev ||Base ||diff ||Dev||Base||Diff||Dev||Base||Diff || |shapes|244.7s|250.7s|-2%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.14| 0%| {code} was (Author: ivera): I would like to raise this issue again as I make a small improvement. I realise that for points I do not need to add the point information for data dimensions, therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that only contain points it means they will compress very well. I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of the index size of 30%! {code} ||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader heap (MB)|| ||Dev||Base||Diff ||Dev ||Base ||diff ||Dev||Base||Diff||Dev||Base||Diff || |shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%| {code} > Add type of triangle info to ShapeField encoding > ------------------------------------------------ > > Key: LUCENE-8997 > URL: https://issues.apache.org/jira/browse/LUCENE-8997 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Ignacio Vera > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We are currently encoding three type of triangle in ShapeField: > * POINT: all three coordinates are equal > * LINE: two coordinates are equal > * TRIANGLE: all coordinates are different > Because we still have two unused bits, it might be worthy to encode this > information in those two bits as follows: > * 0 0 : Unknown so this is an index created before adding this information. > We can compute in this case the information while decoding for backwards > compatibility. > * 1 0: The encoded triangle is a POINT > * 0 1: The encoded triangle is a LINE > * 1 1: The encoded triangle is a TRIANGLE > We can later leverage this information so we don't need to decode all > dimensions in case of POINT and LINE and we are currently computing in some > of the methods ithe type of triangle we are dealing with, This will go as > well. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org