There, back and again ... After fixing a bug in grid search here are some updated results on 2k calls
Degrees : Mean time with Grid : 4.4897266425641025 ms. Average number of docs fetched : 2506.96 Mean time with Grid + Distance filter : 6.4930799487179485 ms. Average number of docs fetched : 425.33435897435896 Mean time with DoubleRange : 14.430638703076923 ms. Average number of docs fetched : 542.0410256410256 Mean time with DoubleRange + Distance filter : 20.483300545128206 ms. Average number of docs fetched : 425.33435897435896 Radians : Mean time with Grid : 5.650845744102564 ms. Average number of docs fetched : 5074.830769230769 Mean time with Grid + Distance filter : 8.627138825128204 ms. Average number of docs fetched : 426.7902564102564 Mean time with DoubleRange : 15.337755502564102 ms. Average number of docs fetched : 1087.705641025641 Mean time with DoubleRange + Distance filter : 20.82852138769231 ms. Average number of docs fetched : 426.7902564102564 Next thing I do not explain yet is the distance filter overhead mismatch : It is less on grid search with more docs to test than on DoubleRange. Niko 2012/5/7 Nicolas Helleringer <nicolas.hellerin...@gmail.com> > Here are some results : > > Mean time with Grid : 4.9297471630769225 ms. Average number of docs > fetched : 2416.373846153846 > Mean time with Grid + Distance filter : 6.48634534 ms. Average number of > docs fetched : 425.84 > Mean time with DoubleRange : 15.39593650051282 ms. Average number of docs > fetched : 542.72 > Mean time with DoubleRange + Distance filter : 21.158394677435897 ms. > Average number of docs fetched : 425.8779487179487 > > Sounds weird that with distance filter the two results are note the same. > I shall investigate that. > > Niko > > 2012/5/7 Emmanuel Bernard <emman...@hibernate.org> > >> Do you know the average amount of POI that were filtered in memory but >> the DistanceFilter during these runs? >> >> Emmanuel >> >> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote: >> >> Hi all, >> >> I have done a radian patch/branch and some benchmarks on geonames french >> database. >> >> Benchs are on 2k calls each run. >> >> Radians: >> run 1 >> Mean time with Grid : 4.808043092820513 ms >> Mean time with Grid + Distance filter : 6.571108878461538 ms >> Mean time with DoubleRange : 14.62661525128205 ms >> Mean time with DoubleRange + Distance filter : 20.143597923076925 ms >> >> run 2 >> Mean time with Grid : 5.290368523076923 ms >> Mean time with Grid + Distance filter : 6.706567517435897 ms >> Mean time with DoubleRange : 14.878960702564102 ms >> Mean time with DoubleRange + Distance filter : 20.75806591948718 ms >> >> Degrees: >> run 1 >> Mean time with Grid : 5.101956610769231 ms >> Mean time with Grid + Distance filter : 6.548685109230769 ms >> Mean time with DoubleRange : 14.767478146153845 ms >> Mean time with DoubleRange + Distance filter : 20.668063972820512 ms >> >> run 2 >> Mean time with Grid : 4.683360031282051 ms >> Mean time with Grid + Distance filter : 6.7065247435897435 ms >> Mean time with DoubleRange : 14.617140157948716 ms >> Mean time with DoubleRange + Distance filter : 20.074868595897435 ms >> >> The radian branch is here for review : >> https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS >> >> While moving from degrees to radians I have seen that DSL has still some >> work to do. >> I shall focus on that now. >> >> Niko >> >> 2012/5/3 Sanne Grinovero <sa...@hibernate.org> >> >>> >>> On May 3, 2012 10:10 AM, "Emmanuel Bernard" <emman...@hibernate.org> >>> wrote: >>> > >>> > How comes the DistanceFilter has to compute the distance for the whole >>> corpus? >>> >>> You're right in that's not always the case, but it's possible. If there >>> are more filters enabled and they are executed first, our filter will need >>> to do the math only on the matched documents by the previous filters, but >>> if there are no other constraints or filters our DistanceFilter might need >>> to process all documents in all segments. This happens also when a limit is >>> enabled on the collector - although limited to the current index segment - >>> when the filter needs to be cached as it needs to evaluate each document in >>> the segment. >>> >>> In our case this DistanceFilter is only applied after RangeQuery was >>> applied on both longitude and latitude, so I'm not sure if this is a big >>> problem; personally I was just wondering but I'd be fine in keeping this as >>> a possible future improvement - but if we go for a separate issue, let's >>> keep in mind that that the index format would not be backwards compatible. >>> >>> >>> > By the way the actual storage (say via Hibernate ORM, or Infinispan) >>> does not need to store in radian, so we don't need to do a conversion when >>> reading an entity. >>> >>> Right, another reason to index only in whatever format makes querying >>> more efficient. >>> >>> -- Sanne >>> >>> >>> > >>> > On 3 mai 2012, at 10:45, Sanne Grinovero wrote: >>> > >>> > > The reason for my comment is that the code is doing a conversion to >>> > > radians in the DistanceFilter, which needs to be extremely efficient >>> > > as it's not only applied on the resultset but potentially on the >>> whole >>> > > corpus of all Documents in the index. >>> > > So even if it's true that conversion would be needed on the final >>> > > results, we always expect people to retrieve only a limited amount of >>> > > entities (like with pagination), while the index might need to >>> perform >>> > > this computation millions of times per query. >>> > > >>> > > If I look at the complexity of Point.getDistanceTo(double, double), I >>> > > get a feeling that that method will hardly provide speedy queries >>> > > because of the complex computations in it - this is just speculation >>> > > at this point of course, to be sure we'd need to compare them with a >>> > > large enough dataset, but it seems quite obvious that storing >>> > > normalized radians should be more efficient as it would avoid a good >>> > > deal of math to be executed on each Document in the index. >>> > > >>> > > Also if we assume people might want to use radians in their user data >>> > > (I know some who definitely would never touch decimals for such a use >>> > > case), there would be no need at all to convert the end result. >>> > > >>> > > Some more thoughts inline: >>> > > >>> > > On 3 May 2012 09:12, Nicolas Helleringer < >>> nicolas.hellerin...@gmail.com> wrote: >>> > >> Hi all, >>> > >> >>> > >> Sanne and I have been wondering about the way the spatial >>> > >> branch/module/functionality for Hibernate Search shall store its >>> > >> coordinates in the Lucene index. >>> > >> >>> > >> Today it is implemented with decimal degree for : >>> > >> - easy debugging/readability >>> > >> - ease of conversion on storage as we want to accept mainly decimal >>> degree >>> > >> from users data >>> > > >>> > > Valid points, but consider that "storage" is going to be way slower >>> > > anyway, and typically you'll process a Document to evaluate it for a >>> > > hit many many orders of magnitude more frequently than the times you >>> > > store it. >>> > > >>> > >> >>> > >> Sanne pointed out that when the search is done there is quite a few >>> > >> conversion to radians for distance calculation and suggested that >>> we may >>> > >> store directly coordinates under their radians form. >>> > >> >>> > >> I have tried a patch to implement this and as I was coding it I >>> feel that >>> > >> the code was less readable, in the coordinates normalisation mainly >>> and >>> > >> that there was as many conversion as before. >>> > >> Conversions had moved from search to import / export of coordinates >>> in and >>> > >> out the spatial module scope to user scope. >>> > > >>> > > I'm sure the amount of points in the code in which they are converted >>> > > won't change. I'm concerned about the cardinality of the collections >>> > > on which it's applied ;) >>> > > "Less readable" isn't nice, but we can work on that I guess? >>> > > >>> > >> >>> > >> What the docs does not tell (yet), is that we are waiting for WGS >>> 84 (this >>> > >> is a coordinate system) decimal degree coordinates input, as these >>> are >>> > >> quite a de facto standard (GPS output this way). >>> > > >>> > > How does it affect this? >>> > > >>> > >> >>> > >> Today this is not the purpose of Hibernate Search spatial >>> initiative to >>> > >> handle projections. There are opensource libs to handle that on >>> user side >>> > >> very well (Proj4j) >>> > >> >>> > >> So. The question is : shall we store as radians or decimal degree ? >>> > >> >>> > >> Niko >>> > >> >>> > >> P.S : Hope it is clear. If not ask for more. >>> > > >>> > > Thanks! >>> > > Sanne >>> > > _______________________________________________ >>> > > hibernate-dev mailing list >>> > > hibernate-dev@lists.jboss.org >>> > > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> > >>> >> >> >> > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev