Hi all, I have done a radian patch/branch and some benchmarks on geonames french database.
Benchs are on 2k calls each run. Radians: run 1 Mean time with Grid : 4.808043092820513 ms Mean time with Grid + Distance filter : 6.571108878461538 ms Mean time with DoubleRange : 14.62661525128205 ms Mean time with DoubleRange + Distance filter : 20.143597923076925 ms run 2 Mean time with Grid : 5.290368523076923 ms Mean time with Grid + Distance filter : 6.706567517435897 ms Mean time with DoubleRange : 14.878960702564102 ms Mean time with DoubleRange + Distance filter : 20.75806591948718 ms Degrees: run 1 Mean time with Grid : 5.101956610769231 ms Mean time with Grid + Distance filter : 6.548685109230769 ms Mean time with DoubleRange : 14.767478146153845 ms Mean time with DoubleRange + Distance filter : 20.668063972820512 ms run 2 Mean time with Grid : 4.683360031282051 ms Mean time with Grid + Distance filter : 6.7065247435897435 ms Mean time with DoubleRange : 14.617140157948716 ms Mean time with DoubleRange + Distance filter : 20.074868595897435 ms The radian branch is here for review : https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS While moving from degrees to radians I have seen that DSL has still some work to do. I shall focus on that now. Niko 2012/5/3 Sanne Grinovero <sa...@hibernate.org> > > On May 3, 2012 10:10 AM, "Emmanuel Bernard" <emman...@hibernate.org> > wrote: > > > > How comes the DistanceFilter has to compute the distance for the whole > corpus? > > You're right in that's not always the case, but it's possible. If there > are more filters enabled and they are executed first, our filter will need > to do the math only on the matched documents by the previous filters, but > if there are no other constraints or filters our DistanceFilter might need > to process all documents in all segments. This happens also when a limit is > enabled on the collector - although limited to the current index segment - > when the filter needs to be cached as it needs to evaluate each document in > the segment. > > In our case this DistanceFilter is only applied after RangeQuery was > applied on both longitude and latitude, so I'm not sure if this is a big > problem; personally I was just wondering but I'd be fine in keeping this as > a possible future improvement - but if we go for a separate issue, let's > keep in mind that that the index format would not be backwards compatible. > > > > By the way the actual storage (say via Hibernate ORM, or Infinispan) > does not need to store in radian, so we don't need to do a conversion when > reading an entity. > > Right, another reason to index only in whatever format makes querying more > efficient. > > -- Sanne > > > > > > On 3 mai 2012, at 10:45, Sanne Grinovero wrote: > > > > > The reason for my comment is that the code is doing a conversion to > > > radians in the DistanceFilter, which needs to be extremely efficient > > > as it's not only applied on the resultset but potentially on the whole > > > corpus of all Documents in the index. > > > So even if it's true that conversion would be needed on the final > > > results, we always expect people to retrieve only a limited amount of > > > entities (like with pagination), while the index might need to perform > > > this computation millions of times per query. > > > > > > If I look at the complexity of Point.getDistanceTo(double, double), I > > > get a feeling that that method will hardly provide speedy queries > > > because of the complex computations in it - this is just speculation > > > at this point of course, to be sure we'd need to compare them with a > > > large enough dataset, but it seems quite obvious that storing > > > normalized radians should be more efficient as it would avoid a good > > > deal of math to be executed on each Document in the index. > > > > > > Also if we assume people might want to use radians in their user data > > > (I know some who definitely would never touch decimals for such a use > > > case), there would be no need at all to convert the end result. > > > > > > Some more thoughts inline: > > > > > > On 3 May 2012 09:12, Nicolas Helleringer < > nicolas.hellerin...@gmail.com> wrote: > > >> Hi all, > > >> > > >> Sanne and I have been wondering about the way the spatial > > >> branch/module/functionality for Hibernate Search shall store its > > >> coordinates in the Lucene index. > > >> > > >> Today it is implemented with decimal degree for : > > >> - easy debugging/readability > > >> - ease of conversion on storage as we want to accept mainly decimal > degree > > >> from users data > > > > > > Valid points, but consider that "storage" is going to be way slower > > > anyway, and typically you'll process a Document to evaluate it for a > > > hit many many orders of magnitude more frequently than the times you > > > store it. > > > > > >> > > >> Sanne pointed out that when the search is done there is quite a few > > >> conversion to radians for distance calculation and suggested that we > may > > >> store directly coordinates under their radians form. > > >> > > >> I have tried a patch to implement this and as I was coding it I feel > that > > >> the code was less readable, in the coordinates normalisation mainly > and > > >> that there was as many conversion as before. > > >> Conversions had moved from search to import / export of coordinates > in and > > >> out the spatial module scope to user scope. > > > > > > I'm sure the amount of points in the code in which they are converted > > > won't change. I'm concerned about the cardinality of the collections > > > on which it's applied ;) > > > "Less readable" isn't nice, but we can work on that I guess? > > > > > >> > > >> What the docs does not tell (yet), is that we are waiting for WGS 84 > (this > > >> is a coordinate system) decimal degree coordinates input, as these are > > >> quite a de facto standard (GPS output this way). > > > > > > How does it affect this? > > > > > >> > > >> Today this is not the purpose of Hibernate Search spatial initiative > to > > >> handle projections. There are opensource libs to handle that on user > side > > >> very well (Proj4j) > > >> > > >> So. The question is : shall we store as radians or decimal degree ? > > >> > > >> Niko > > >> > > >> P.S : Hope it is clear. If not ask for more. > > > > > > Thanks! > > > Sanne > > > _______________________________________________ > > > hibernate-dev mailing list > > > hibernate-dev@lists.jboss.org > > > https://lists.jboss.org/mailman/listinfo/hibernate-dev > > > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev