[ https://issues.apache.org/jira/browse/LUCENE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647128#comment-14647128 ]
Nicholas Knize edited comment on LUCENE-6699 at 7/30/15 12:00 PM: ------------------------------------------------------------------ bq. The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code Just to clarify, so there's no confusion, the attached code converts from lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon (if that's what you were referring to). The attached was intended to be used for a Geo3d integration w/ GeoPointField. bq. I expect the conversion for every record would turn out to be more expensive. Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt). Conversion cost vs. wateful representation is the interesting question, so I threw together a quick encoding/decoding benchmark (on my i7 16GB System76) and ran it on 100M points (converting from lla to ECEF and back). The law of large numbers took over at around 35M. For interest the results are provided: {noformat} Avg computation: 620.4319757666667 ns Trials: 30000000 Total time: 18612.959273 ms Avg computation: 621.3008362285715 ns Trials: 35000000 Total time: 21745.529268 ms Avg computation: 621.582647925 ns Trials: 40000000 Total time: 24863.305917 ms Avg computation: 621.3724589555555 ns Trials: 45000000 Total time: 27961.760653 ms Avg computation: 621.16271364 ns Trials: 50000000 Total time: 31058.135682 ms Avg computation: 621.3857686909091 ns Trials: 55000000 Total time: 34176.217278 ms Avg computation: 621.8110524 ns Trials: 60000000 Total time: 37308.663144 ms Avg computation: 621.6768083230769 ns Trials: 65000000 Total time: 40408.992541 ms Avg computation: 621.5324306714285 ns Trials: 70000000 Total time: 43507.270147 ms Avg computation: 621.4440536933333 ns Trials: 75000000 Total time: 46608.304027 ms Avg computation: 621.7594845875 ns Trials: 80000000 Total time: 49740.758767 ms Avg computation: 621.9327540705882 ns Trials: 85000000 Total time: 52864.284096 ms Avg computation: 621.9429434555556 ns Trials: 90000000 Total time: 55974.864911 ms Avg computation: 621.8868688947368 ns Trials: 95000000 Total time: 59079.252545 ms Avg computation: 621.98037608 ns Trials: 100000000 Total time: 62198.037608 ms {noformat} Again, those are both conversions to ECEF and back to LLA. So roughly 1 minute for 100M points. Halve those numbers and you have the cost for converting either direction. Next we could benchmark tree access and compare? I suspect traversal cost should be a function of the on-disk representation and chosen split method (that's where RTrees tend to become costly). Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward? was (Author: nknize): bq. The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code Just to clarify, so there's no confusion, the attached code converts from lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon (if that's what you were referring to). The attached was intended to be used for a Geo3d integration w/ GeoPointField. bq. I expect the conversion for every record would turn out to be more expensive. Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt). Conversion cost vs. wateful representation is the interesting question, so I threw together a quick encoding/decoding benchmark (on my i7 16GB System76) and ran it on 1B points (converting from lla to ECEF and back). The law of large numbers took over at around 350M. For interest the results are provided: {noformat} Avg computation: 620.4319757666667 ns Trials: 30000000 Total time: 18612.959273 ms Avg computation: 621.3008362285715 ns Trials: 35000000 Total time: 21745.529268 ms Avg computation: 621.582647925 ns Trials: 40000000 Total time: 24863.305917 ms Avg computation: 621.3724589555555 ns Trials: 45000000 Total time: 27961.760653 ms Avg computation: 621.16271364 ns Trials: 50000000 Total time: 31058.135682 ms Avg computation: 621.3857686909091 ns Trials: 55000000 Total time: 34176.217278 ms Avg computation: 621.8110524 ns Trials: 60000000 Total time: 37308.663144 ms Avg computation: 621.6768083230769 ns Trials: 65000000 Total time: 40408.992541 ms Avg computation: 621.5324306714285 ns Trials: 70000000 Total time: 43507.270147 ms Avg computation: 621.4440536933333 ns Trials: 75000000 Total time: 46608.304027 ms Avg computation: 621.7594845875 ns Trials: 80000000 Total time: 49740.758767 ms Avg computation: 621.9327540705882 ns Trials: 85000000 Total time: 52864.284096 ms Avg computation: 621.9429434555556 ns Trials: 90000000 Total time: 55974.864911 ms Avg computation: 621.8868688947368 ns Trials: 95000000 Total time: 59079.252545 ms Avg computation: 621.98037608 ns Trials: 100000000 Total time: 62198.037608 ms {noformat} Again, those are both conversions to ECEF and back to LLA. So roughly 1 minute for 1B points. Halve those numbers and you have the cost for converting either direction. Next we could benchmark tree access and compare? I suspect traversal cost should be a function of the on-disk representation and chosen split method (that's where RTrees tend to become costly). Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward? > Integrate lat/lon BKD and spatial3d > ----------------------------------- > > Key: LUCENE-6699 > URL: https://issues.apache.org/jira/browse/LUCENE-6699 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: Geo3DPacking.java > > > I'm opening this for discussion, because I'm not yet sure how to do > this integration, because of my ignorance about spatial in general and > spatial3d in particular :) > Our BKD tree impl is very fast at doing lat/lon shape intersection > (bbox, polygon, soon distance: LUCENE-6698) against previously indexed > points. > I think to integrate with spatial3d, we would first need to record > lat/lon/z into doc values. Somewhere I saw discussion about how we > could stuff all 3 into a single long value with acceptable precision > loss? Or, we could use BinaryDocValues? We need all 3 dims available > to do the fast per-hit query time filtering. > But, second: what do we index into the BKD tree? Can we "just" index > earth surface lat/lon, and then at query time is spatial3d able to > give me an enclosing "surface lat/lon" bbox for a 3d shape? Or > ... must we index all 3 dimensions into the BKD tree (seems like this > could be somewhat wasteful)? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org