> > On a side note, what do you need System.exit(0); for ? You should > close the SessionFactory. > Because i m better with geo/data than with code =) Thanks for pointing me the right direction.
The last numbers series is from a 50k calls run in radian mode that lasted 45 minutes. For each center the bench runs the 4 modes of request ending up in ~45 ms the loop. Niko > Cheers, > Sanne > > On 15 May 2012 14:04, Nicolas Helleringer <nicolas.hellerin...@gmail.com> > wrote: > > I did the seed on the random generator. > > > > Here are some results: > > > > Degrees 2K calls > > Mean time with Grid : 4.769457488717949 ms. Average number of docs > fetched > > : 2524.982564102564 > > Mean time with Grid + Distance filter : 6.501712946153845 ms. Average > number > > of docs fetched : 426.1876923076923 > > Mean time with DoubleRange : 14.336663392307692 ms. Average number of > docs > > fetched : 543.6035897435897 > > Mean time with DoubleRange + Distance filter : 19.7123163574359 ms. > Average > > number of docs fetched : 426.1876923076923 > > > > Radians 2K calls > > Mean time with Grid : 4.430686068205128 ms. Average number of docs > fetched > > : 2524.982564102564 > > Mean time with Grid + Distance filter : 6.717519717948718 ms. Average > number > > of docs fetched : 426.1876923076923 > > Mean time with DoubleRange : 14.35186034 ms. Average number of docs > fetched > > : 543.6035897435897 > > Mean time with DoubleRange + Distance filter : 20.073972284102563 ms. > > Average number of docs fetched : 426.1876923076923 > > > > Radians 50k calls > > Mean time with Grid : 4.440979528643216 ms. Average number of docs > fetched > > : 2459.169386934673 > > Mean time with Grid + Distance filter : 6.722681398331658 ms. Average > number > > of docs fetched : 416.2335879396985 > > Mean time with DoubleRange : 14.532376860201005 ms. Average number of > docs > > fetched : 530.2923618090452 > > Mean time with DoubleRange + Distance filter : 20.21980649284422 ms. > Average > > number of docs fetched : 416.2335879396985 > > > > On the random part you can see by looking at the average umber of docs on > > the 2k calls that the seed did its works, the requests are the same. > > > > As you can see there is not such a difference between 2k and 50k calls > runs. > > > > What I have investigated too is the overhead of the distance filter over > the > > double range approach. I do fear that the wrapping > > of the lat,long range query in a QueryWrapperFilter is costly but i > cannnot > > prove it, yet. > > > > Back to the main question : does radian storage gives better performance > ? I > > cannot say with my test env. It seems pretty close to me. > > Maybe if someone manages to launch the bench on a different > environnement. > > > > Niko > > > > PS : both branches are up to date in my github > > : > https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923 & > https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS > > > > 2012/5/14 Nicolas Helleringer <nicolas.hellerin...@gmail.com> > >>> > >>> maybe even simpler set a constant as the seed of your random > >>> generator: should provide a reproducible sequence of values. > >> > >> /facepalm > >> I should have guess that :s > >> > >> Niko > >> > >>> > >>> >> > >>> >> On 11 May 2012 08:40, Nicolas Helleringer > >>> >> <nicolas.hellerin...@gmail.com> > >>> >> wrote: > >>> >> > There, back and again ... > >>> >> > > >>> >> > After fixing a bug in grid search here are some updated results on > >>> >> > 2k > >>> >> > calls > >>> >> > > >>> >> > Degrees : > >>> >> > Mean time with Grid : 4.4897266425641025 ms. Average number of > docs > >>> >> > fetched > >>> >> > : 2506.96 > >>> >> > Mean time with Grid + Distance filter : 6.4930799487179485 ms. > >>> >> > Average > >>> >> > number of docs fetched : 425.33435897435896 > >>> >> > Mean time with DoubleRange : 14.430638703076923 ms. Average number > >>> >> > of > >>> >> > docs > >>> >> > fetched : 542.0410256410256 > >>> >> > Mean time with DoubleRange + Distance filter : 20.483300545128206 > >>> >> > ms. > >>> >> > Average number of docs fetched : 425.33435897435896 > >>> >> > > >>> >> > Radians : > >>> >> > Mean time with Grid : 5.650845744102564 ms. Average number of docs > >>> >> > fetched > >>> >> > : 5074.830769230769 > >>> >> > Mean time with Grid + Distance filter : 8.627138825128204 ms. > >>> >> > Average > >>> >> > number > >>> >> > of docs fetched : 426.7902564102564 > >>> >> > Mean time with DoubleRange : 15.337755502564102 ms. Average number > >>> >> > of > >>> >> > docs > >>> >> > fetched : 1087.705641025641 > >>> >> > Mean time with DoubleRange + Distance filter : 20.82852138769231 > ms. > >>> >> > Average > >>> >> > number of docs fetched : 426.7902564102564 > >>> >> > > >>> >> > Next thing I do not explain yet is the distance filter overhead > >>> >> > mismatch > >>> >> > : > >>> >> > It is less on grid search with more docs to test than on > >>> >> > DoubleRange. > >>> >> > > >>> >> > Niko > >>> >> > > >>> >> > > >>> >> > 2012/5/7 Nicolas Helleringer <nicolas.hellerin...@gmail.com> > >>> >> >> > >>> >> >> Here are some results : > >>> >> >> > >>> >> >> Mean time with Grid : 4.9297471630769225 ms. Average number of > docs > >>> >> >> fetched : 2416.373846153846 > >>> >> >> Mean time with Grid + Distance filter : 6.48634534 ms. Average > >>> >> >> number > >>> >> >> of > >>> >> >> docs fetched : 425.84 > >>> >> >> Mean time with DoubleRange : 15.39593650051282 ms. Average number > >>> >> >> of > >>> >> >> docs > >>> >> >> fetched : 542.72 > >>> >> >> Mean time with DoubleRange + Distance filter : 21.158394677435897 > >>> >> >> ms. > >>> >> >> Average number of docs fetched : 425.8779487179487 > >>> >> >> > >>> >> >> Sounds weird that with distance filter the two results are note > the > >>> >> >> same. > >>> >> >> I shall investigate that. > >>> >> >> > >>> >> >> Niko > >>> >> >> > >>> >> >> 2012/5/7 Emmanuel Bernard <emman...@hibernate.org> > >>> >> >>> > >>> >> >>> Do you know the average amount of POI that were filtered in > memory > >>> >> >>> but > >>> >> >>> the DistanceFilter during these runs? > >>> >> >>> > >>> >> >>> Emmanuel > >>> >> >>> > >>> >> >>> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote: > >>> >> >>> > >>> >> >>> Hi all, > >>> >> >>> > >>> >> >>> I have done a radian patch/branch and some benchmarks on > geonames > >>> >> >>> french > >>> >> >>> database. > >>> >> >>> > >>> >> >>> Benchs are on 2k calls each run. > >>> >> >>> > >>> >> >>> Radians: > >>> >> >>> run 1 > >>> >> >>> Mean time with Grid : 4.808043092820513 ms > >>> >> >>> Mean time with Grid + Distance filter : 6.571108878461538 ms > >>> >> >>> Mean time with DoubleRange : 14.62661525128205 ms > >>> >> >>> Mean time with DoubleRange + Distance filter : > 20.143597923076925 > >>> >> >>> ms > >>> >> >>> > >>> >> >>> run 2 > >>> >> >>> Mean time with Grid : 5.290368523076923 ms > >>> >> >>> Mean time with Grid + Distance filter : 6.706567517435897 ms > >>> >> >>> Mean time with DoubleRange : 14.878960702564102 ms > >>> >> >>> Mean time with DoubleRange + Distance filter : 20.75806591948718 > >>> >> >>> ms > >>> >> >>> > >>> >> >>> Degrees: > >>> >> >>> run 1 > >>> >> >>> Mean time with Grid : 5.101956610769231 ms > >>> >> >>> Mean time with Grid + Distance filter : 6.548685109230769 ms > >>> >> >>> Mean time with DoubleRange : 14.767478146153845 ms > >>> >> >>> Mean time with DoubleRange + Distance filter : > 20.668063972820512 > >>> >> >>> ms > >>> >> >>> > >>> >> >>> run 2 > >>> >> >>> Mean time with Grid : 4.683360031282051 ms > >>> >> >>> Mean time with Grid + Distance filter : 6.7065247435897435 ms > >>> >> >>> Mean time with DoubleRange : 14.617140157948716 ms > >>> >> >>> Mean time with DoubleRange + Distance filter : > 20.074868595897435 > >>> >> >>> ms > >>> >> >>> > >>> >> >>> The radian branch is here for review > >>> >> >>> > >>> >> >>> > >>> >> >>> : > https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS > >>> >> >>> > >>> >> >>> While moving from degrees to radians I have seen that DSL has > >>> >> >>> still > >>> >> >>> some > >>> >> >>> work to do. > >>> >> >>> I shall focus on that now. > >>> >> >>> > >>> >> >>> Niko > >>> >> >>> > >>> >> >>> 2012/5/3 Sanne Grinovero <sa...@hibernate.org> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> On May 3, 2012 10:10 AM, "Emmanuel Bernard" > >>> >> >>>> <emman...@hibernate.org> > >>> >> >>>> wrote: > >>> >> >>>> > > >>> >> >>>> > How comes the DistanceFilter has to compute the distance for > >>> >> >>>> > the > >>> >> >>>> > whole > >>> >> >>>> > corpus? > >>> >> >>>> > >>> >> >>>> You're right in that's not always the case, but it's possible. > If > >>> >> >>>> there > >>> >> >>>> are more filters enabled and they are executed first, our > filter > >>> >> >>>> will > >>> >> >>>> need > >>> >> >>>> to do the math only on the matched documents by the previous > >>> >> >>>> filters, > >>> >> >>>> but if > >>> >> >>>> there are no other constraints or filters our DistanceFilter > >>> >> >>>> might > >>> >> >>>> need to > >>> >> >>>> process all documents in all segments. This happens also when a > >>> >> >>>> limit > >>> >> >>>> is > >>> >> >>>> enabled on the collector - although limited to the current > index > >>> >> >>>> segment - > >>> >> >>>> when the filter needs to be cached as it needs to evaluate each > >>> >> >>>> document in > >>> >> >>>> the segment. > >>> >> >>>> > >>> >> >>>> In our case this DistanceFilter is only applied after > RangeQuery > >>> >> >>>> was > >>> >> >>>> applied on both longitude and latitude, so I'm not sure if this > >>> >> >>>> is a > >>> >> >>>> big > >>> >> >>>> problem; personally I was just wondering but I'd be fine in > >>> >> >>>> keeping > >>> >> >>>> this as > >>> >> >>>> a possible future improvement - but if we go for a separate > >>> >> >>>> issue, > >>> >> >>>> let's > >>> >> >>>> keep in mind that that the index format would not be backwards > >>> >> >>>> compatible. > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > By the way the actual storage (say via Hibernate ORM, or > >>> >> >>>> > Infinispan) > >>> >> >>>> > does not need to store in radian, so we don't need to do a > >>> >> >>>> > conversion when > >>> >> >>>> > reading an entity. > >>> >> >>>> > >>> >> >>>> Right, another reason to index only in whatever format makes > >>> >> >>>> querying > >>> >> >>>> more efficient. > >>> >> >>>> > >>> >> >>>> -- Sanne > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > > >>> >> >>>> > On 3 mai 2012, at 10:45, Sanne Grinovero wrote: > >>> >> >>>> > > >>> >> >>>> > > The reason for my comment is that the code is doing a > >>> >> >>>> > > conversion > >>> >> >>>> > > to > >>> >> >>>> > > radians in the DistanceFilter, which needs to be extremely > >>> >> >>>> > > efficient > >>> >> >>>> > > as it's not only applied on the resultset but potentially > on > >>> >> >>>> > > the > >>> >> >>>> > > whole > >>> >> >>>> > > corpus of all Documents in the index. > >>> >> >>>> > > So even if it's true that conversion would be needed on the > >>> >> >>>> > > final > >>> >> >>>> > > results, we always expect people to retrieve only a limited > >>> >> >>>> > > amount > >>> >> >>>> > > of > >>> >> >>>> > > entities (like with pagination), while the index might need > >>> >> >>>> > > to > >>> >> >>>> > > perform > >>> >> >>>> > > this computation millions of times per query. > >>> >> >>>> > > > >>> >> >>>> > > If I look at the complexity of Point.getDistanceTo(double, > >>> >> >>>> > > double), > >>> >> >>>> > > I > >>> >> >>>> > > get a feeling that that method will hardly provide speedy > >>> >> >>>> > > queries > >>> >> >>>> > > because of the complex computations in it - this is just > >>> >> >>>> > > speculation > >>> >> >>>> > > at this point of course, to be sure we'd need to compare > them > >>> >> >>>> > > with a > >>> >> >>>> > > large enough dataset, but it seems quite obvious that > storing > >>> >> >>>> > > normalized radians should be more efficient as it would > avoid > >>> >> >>>> > > a > >>> >> >>>> > > good > >>> >> >>>> > > deal of math to be executed on each Document in the index. > >>> >> >>>> > > > >>> >> >>>> > > Also if we assume people might want to use radians in their > >>> >> >>>> > > user > >>> >> >>>> > > data > >>> >> >>>> > > (I know some who definitely would never touch decimals for > >>> >> >>>> > > such a > >>> >> >>>> > > use > >>> >> >>>> > > case), there would be no need at all to convert the end > >>> >> >>>> > > result. > >>> >> >>>> > > > >>> >> >>>> > > Some more thoughts inline: > >>> >> >>>> > > > >>> >> >>>> > > On 3 May 2012 09:12, Nicolas Helleringer > >>> >> >>>> > > <nicolas.hellerin...@gmail.com> wrote: > >>> >> >>>> > >> Hi all, > >>> >> >>>> > >> > >>> >> >>>> > >> Sanne and I have been wondering about the way the spatial > >>> >> >>>> > >> branch/module/functionality for Hibernate Search shall > store > >>> >> >>>> > >> its > >>> >> >>>> > >> coordinates in the Lucene index. > >>> >> >>>> > >> > >>> >> >>>> > >> Today it is implemented with decimal degree for : > >>> >> >>>> > >> - easy debugging/readability > >>> >> >>>> > >> - ease of conversion on storage as we want to accept > mainly > >>> >> >>>> > >> decimal > >>> >> >>>> > >> degree > >>> >> >>>> > >> from users data > >>> >> >>>> > > > >>> >> >>>> > > Valid points, but consider that "storage" is going to be > way > >>> >> >>>> > > slower > >>> >> >>>> > > anyway, and typically you'll process a Document to evaluate > >>> >> >>>> > > it > >>> >> >>>> > > for a > >>> >> >>>> > > hit many many orders of magnitude more frequently than the > >>> >> >>>> > > times > >>> >> >>>> > > you > >>> >> >>>> > > store it. > >>> >> >>>> > > > >>> >> >>>> > >> > >>> >> >>>> > >> Sanne pointed out that when the search is done there is > >>> >> >>>> > >> quite a > >>> >> >>>> > >> few > >>> >> >>>> > >> conversion to radians for distance calculation and > suggested > >>> >> >>>> > >> that > >>> >> >>>> > >> we may > >>> >> >>>> > >> store directly coordinates under their radians form. > >>> >> >>>> > >> > >>> >> >>>> > >> I have tried a patch to implement this and as I was coding > >>> >> >>>> > >> it I > >>> >> >>>> > >> feel that > >>> >> >>>> > >> the code was less readable, in the coordinates > normalisation > >>> >> >>>> > >> mainly > >>> >> >>>> > >> and > >>> >> >>>> > >> that there was as many conversion as before. > >>> >> >>>> > >> Conversions had moved from search to import / export of > >>> >> >>>> > >> coordinates > >>> >> >>>> > >> in and > >>> >> >>>> > >> out the spatial module scope to user scope. > >>> >> >>>> > > > >>> >> >>>> > > I'm sure the amount of points in the code in which they are > >>> >> >>>> > > converted > >>> >> >>>> > > won't change. I'm concerned about the cardinality of the > >>> >> >>>> > > collections > >>> >> >>>> > > on which it's applied ;) > >>> >> >>>> > > "Less readable" isn't nice, but we can work on that I > guess? > >>> >> >>>> > > > >>> >> >>>> > >> > >>> >> >>>> > >> What the docs does not tell (yet), is that we are waiting > >>> >> >>>> > >> for > >>> >> >>>> > >> WGS > >>> >> >>>> > >> 84 (this > >>> >> >>>> > >> is a coordinate system) decimal degree coordinates input, > as > >>> >> >>>> > >> these > >>> >> >>>> > >> are > >>> >> >>>> > >> quite a de facto standard (GPS output this way). > >>> >> >>>> > > > >>> >> >>>> > > How does it affect this? > >>> >> >>>> > > > >>> >> >>>> > >> > >>> >> >>>> > >> Today this is not the purpose of Hibernate Search spatial > >>> >> >>>> > >> initiative to > >>> >> >>>> > >> handle projections. There are opensource libs to handle > that > >>> >> >>>> > >> on > >>> >> >>>> > >> user side > >>> >> >>>> > >> very well (Proj4j) > >>> >> >>>> > >> > >>> >> >>>> > >> So. The question is : shall we store as radians or decimal > >>> >> >>>> > >> degree ? > >>> >> >>>> > >> > >>> >> >>>> > >> Niko > >>> >> >>>> > >> > >>> >> >>>> > >> P.S : Hope it is clear. If not ask for more. > >>> >> >>>> > > > >>> >> >>>> > > Thanks! > >>> >> >>>> > > Sanne > >>> >> >>>> > > _______________________________________________ > >>> >> >>>> > > hibernate-dev mailing list > >>> >> >>>> > > hibernate-dev@lists.jboss.org > >>> >> >>>> > > https://lists.jboss.org/mailman/listinfo/hibernate-dev > >>> >> >>>> > > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >> > >>> >> > > >>> > > >>> > > >> > >> > > > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev