[ 
https://issues.apache.org/jira/browse/LUCENE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647128#comment-14647128
 ] 

Nicholas Knize edited comment on LUCENE-6699 at 7/30/15 12:00 PM:
------------------------------------------------------------------

bq. The obvious alternative would be to store just lat/lon, exactly like is 
done for Nicholas's code

Just to clarify, so there's no confusion, the attached code converts from 
lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) 
and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon 
(if that's what you were referring to).  The attached was intended to be used 
for a Geo3d integration w/ GeoPointField.

bq.  I expect the conversion for every record would turn out to be more 
expensive.

Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative 
conversion (2 sines, 2 cosines, 1 sqrt).  Conversion cost vs. wateful 
representation is the interesting question, so I threw together a quick 
encoding/decoding benchmark (on my i7 16GB System76) and ran it on 100M points 
(converting from lla to ECEF and back). The law of large numbers took over at 
around 35M. For interest the results are provided:

{noformat}
Avg computation: 620.4319757666667 ns  Trials: 30000000  Total time: 
18612.959273 ms
Avg computation: 621.3008362285715 ns  Trials: 35000000  Total time: 
21745.529268 ms
Avg computation: 621.582647925 ns  Trials: 40000000  Total time: 24863.305917 ms
Avg computation: 621.3724589555555 ns  Trials: 45000000  Total time: 
27961.760653 ms
Avg computation: 621.16271364 ns  Trials: 50000000  Total time: 31058.135682 ms
Avg computation: 621.3857686909091 ns  Trials: 55000000  Total time: 
34176.217278 ms
Avg computation: 621.8110524 ns  Trials: 60000000  Total time: 37308.663144 ms
Avg computation: 621.6768083230769 ns  Trials: 65000000  Total time: 
40408.992541 ms
Avg computation: 621.5324306714285 ns  Trials: 70000000  Total time: 
43507.270147 ms
Avg computation: 621.4440536933333 ns  Trials: 75000000  Total time: 
46608.304027 ms
Avg computation: 621.7594845875 ns  Trials: 80000000  Total time: 49740.758767 
ms
Avg computation: 621.9327540705882 ns  Trials: 85000000  Total time: 
52864.284096 ms
Avg computation: 621.9429434555556 ns  Trials: 90000000  Total time: 
55974.864911 ms
Avg computation: 621.8868688947368 ns  Trials: 95000000  Total time: 
59079.252545 ms
Avg computation: 621.98037608 ns  Trials: 100000000  Total time: 62198.037608 ms
{noformat}

Again, those are both conversions to ECEF and back to LLA.  So roughly 1 minute 
for 100M points.  Halve those numbers and you have the cost for converting 
either direction. 

Next we could benchmark tree access and compare?  I suspect traversal cost 
should be a function of the on-disk representation and chosen split method 
(that's where RTrees tend to become costly). Since BKD splits a sorted space, 
and it can exploit file system cache? I suspect there aren't many random seeks? 
Seems benchmarking might be relatively straightforward?


was (Author: nknize):
bq. The obvious alternative would be to store just lat/lon, exactly like is 
done for Nicholas's code

Just to clarify, so there's no confusion, the attached code converts from 
lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) 
and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon 
(if that's what you were referring to).  The attached was intended to be used 
for a Geo3d integration w/ GeoPointField.

bq.  I expect the conversion for every record would turn out to be more 
expensive.

Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative 
conversion (2 sines, 2 cosines, 1 sqrt).  Conversion cost vs. wateful 
representation is the interesting question, so I threw together a quick 
encoding/decoding benchmark (on my i7 16GB System76) and ran it on 1B points 
(converting from lla to ECEF and back). The law of large numbers took over at 
around 350M. For interest the results are provided:

{noformat}
Avg computation: 620.4319757666667 ns  Trials: 30000000  Total time: 
18612.959273 ms
Avg computation: 621.3008362285715 ns  Trials: 35000000  Total time: 
21745.529268 ms
Avg computation: 621.582647925 ns  Trials: 40000000  Total time: 24863.305917 ms
Avg computation: 621.3724589555555 ns  Trials: 45000000  Total time: 
27961.760653 ms
Avg computation: 621.16271364 ns  Trials: 50000000  Total time: 31058.135682 ms
Avg computation: 621.3857686909091 ns  Trials: 55000000  Total time: 
34176.217278 ms
Avg computation: 621.8110524 ns  Trials: 60000000  Total time: 37308.663144 ms
Avg computation: 621.6768083230769 ns  Trials: 65000000  Total time: 
40408.992541 ms
Avg computation: 621.5324306714285 ns  Trials: 70000000  Total time: 
43507.270147 ms
Avg computation: 621.4440536933333 ns  Trials: 75000000  Total time: 
46608.304027 ms
Avg computation: 621.7594845875 ns  Trials: 80000000  Total time: 49740.758767 
ms
Avg computation: 621.9327540705882 ns  Trials: 85000000  Total time: 
52864.284096 ms
Avg computation: 621.9429434555556 ns  Trials: 90000000  Total time: 
55974.864911 ms
Avg computation: 621.8868688947368 ns  Trials: 95000000  Total time: 
59079.252545 ms
Avg computation: 621.98037608 ns  Trials: 100000000  Total time: 62198.037608 ms
{noformat}

Again, those are both conversions to ECEF and back to LLA.  So roughly 1 minute 
for 1B points.  Halve those numbers and you have the cost for converting either 
direction. 

Next we could benchmark tree access and compare?  I suspect traversal cost 
should be a function of the on-disk representation and chosen split method 
(that's where RTrees tend to become costly). Since BKD splits a sorted space, 
and it can exploit file system cache? I suspect there aren't many random seeks? 
Seems benchmarking might be relatively straightforward?

> Integrate lat/lon BKD and spatial3d
> -----------------------------------
>
>                 Key: LUCENE-6699
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6699
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: Geo3DPacking.java
>
>
> I'm opening this for discussion, because I'm not yet sure how to do
> this integration, because of my ignorance about spatial in general and
> spatial3d in particular :)
> Our BKD tree impl is very fast at doing lat/lon shape intersection
> (bbox, polygon, soon distance: LUCENE-6698) against previously indexed
> points.
> I think to integrate with spatial3d, we would first need to record
> lat/lon/z into doc values.  Somewhere I saw discussion about how we
> could stuff all 3 into a single long value with acceptable precision
> loss?  Or, we could use BinaryDocValues?  We need all 3 dims available
> to do the fast per-hit query time filtering.
> But, second: what do we index into the BKD tree?  Can we "just" index
> earth surface lat/lon, and then at query time is spatial3d able to
> give me an enclosing "surface lat/lon" bbox for a 3d shape?  Or
> ... must we index all 3 dimensions into the BKD tree (seems like this
> could be somewhat wasteful)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to