> > maybe even simpler set a constant as the seed of your random > generator: should provide a reproducible sequence of values.
/facepalm I should have guess that :s Niko > >> > >> On 11 May 2012 08:40, Nicolas Helleringer < > nicolas.hellerin...@gmail.com> > >> wrote: > >> > There, back and again ... > >> > > >> > After fixing a bug in grid search here are some updated results on 2k > >> > calls > >> > > >> > Degrees : > >> > Mean time with Grid : 4.4897266425641025 ms. Average number of docs > >> > fetched > >> > : 2506.96 > >> > Mean time with Grid + Distance filter : 6.4930799487179485 ms. Average > >> > number of docs fetched : 425.33435897435896 > >> > Mean time with DoubleRange : 14.430638703076923 ms. Average number of > >> > docs > >> > fetched : 542.0410256410256 > >> > Mean time with DoubleRange + Distance filter : 20.483300545128206 ms. > >> > Average number of docs fetched : 425.33435897435896 > >> > > >> > Radians : > >> > Mean time with Grid : 5.650845744102564 ms. Average number of docs > >> > fetched > >> > : 5074.830769230769 > >> > Mean time with Grid + Distance filter : 8.627138825128204 ms. Average > >> > number > >> > of docs fetched : 426.7902564102564 > >> > Mean time with DoubleRange : 15.337755502564102 ms. Average number of > >> > docs > >> > fetched : 1087.705641025641 > >> > Mean time with DoubleRange + Distance filter : 20.82852138769231 ms. > >> > Average > >> > number of docs fetched : 426.7902564102564 > >> > > >> > Next thing I do not explain yet is the distance filter overhead > mismatch > >> > : > >> > It is less on grid search with more docs to test than on DoubleRange. > >> > > >> > Niko > >> > > >> > > >> > 2012/5/7 Nicolas Helleringer <nicolas.hellerin...@gmail.com> > >> >> > >> >> Here are some results : > >> >> > >> >> Mean time with Grid : 4.9297471630769225 ms. Average number of docs > >> >> fetched : 2416.373846153846 > >> >> Mean time with Grid + Distance filter : 6.48634534 ms. Average number > >> >> of > >> >> docs fetched : 425.84 > >> >> Mean time with DoubleRange : 15.39593650051282 ms. Average number of > >> >> docs > >> >> fetched : 542.72 > >> >> Mean time with DoubleRange + Distance filter : 21.158394677435897 ms. > >> >> Average number of docs fetched : 425.8779487179487 > >> >> > >> >> Sounds weird that with distance filter the two results are note the > >> >> same. > >> >> I shall investigate that. > >> >> > >> >> Niko > >> >> > >> >> 2012/5/7 Emmanuel Bernard <emman...@hibernate.org> > >> >>> > >> >>> Do you know the average amount of POI that were filtered in memory > but > >> >>> the DistanceFilter during these runs? > >> >>> > >> >>> Emmanuel > >> >>> > >> >>> On 7 mai 2012, at 10:31, Nicolas Helleringer wrote: > >> >>> > >> >>> Hi all, > >> >>> > >> >>> I have done a radian patch/branch and some benchmarks on geonames > >> >>> french > >> >>> database. > >> >>> > >> >>> Benchs are on 2k calls each run. > >> >>> > >> >>> Radians: > >> >>> run 1 > >> >>> Mean time with Grid : 4.808043092820513 ms > >> >>> Mean time with Grid + Distance filter : 6.571108878461538 ms > >> >>> Mean time with DoubleRange : 14.62661525128205 ms > >> >>> Mean time with DoubleRange + Distance filter : 20.143597923076925 ms > >> >>> > >> >>> run 2 > >> >>> Mean time with Grid : 5.290368523076923 ms > >> >>> Mean time with Grid + Distance filter : 6.706567517435897 ms > >> >>> Mean time with DoubleRange : 14.878960702564102 ms > >> >>> Mean time with DoubleRange + Distance filter : 20.75806591948718 ms > >> >>> > >> >>> Degrees: > >> >>> run 1 > >> >>> Mean time with Grid : 5.101956610769231 ms > >> >>> Mean time with Grid + Distance filter : 6.548685109230769 ms > >> >>> Mean time with DoubleRange : 14.767478146153845 ms > >> >>> Mean time with DoubleRange + Distance filter : 20.668063972820512 ms > >> >>> > >> >>> run 2 > >> >>> Mean time with Grid : 4.683360031282051 ms > >> >>> Mean time with Grid + Distance filter : 6.7065247435897435 ms > >> >>> Mean time with DoubleRange : 14.617140157948716 ms > >> >>> Mean time with DoubleRange + Distance filter : 20.074868595897435 ms > >> >>> > >> >>> The radian branch is here for review > >> >>> > >> >>> : > https://github.com/nicolashelleringer/hibernate-search/tree/HSEARCH-923-RADIANS > >> >>> > >> >>> While moving from degrees to radians I have seen that DSL has still > >> >>> some > >> >>> work to do. > >> >>> I shall focus on that now. > >> >>> > >> >>> Niko > >> >>> > >> >>> 2012/5/3 Sanne Grinovero <sa...@hibernate.org> > >> >>>> > >> >>>> > >> >>>> On May 3, 2012 10:10 AM, "Emmanuel Bernard" < > emman...@hibernate.org> > >> >>>> wrote: > >> >>>> > > >> >>>> > How comes the DistanceFilter has to compute the distance for the > >> >>>> > whole > >> >>>> > corpus? > >> >>>> > >> >>>> You're right in that's not always the case, but it's possible. If > >> >>>> there > >> >>>> are more filters enabled and they are executed first, our filter > will > >> >>>> need > >> >>>> to do the math only on the matched documents by the previous > filters, > >> >>>> but if > >> >>>> there are no other constraints or filters our DistanceFilter might > >> >>>> need to > >> >>>> process all documents in all segments. This happens also when a > limit > >> >>>> is > >> >>>> enabled on the collector - although limited to the current index > >> >>>> segment - > >> >>>> when the filter needs to be cached as it needs to evaluate each > >> >>>> document in > >> >>>> the segment. > >> >>>> > >> >>>> In our case this DistanceFilter is only applied after RangeQuery > was > >> >>>> applied on both longitude and latitude, so I'm not sure if this is > a > >> >>>> big > >> >>>> problem; personally I was just wondering but I'd be fine in keeping > >> >>>> this as > >> >>>> a possible future improvement - but if we go for a separate issue, > >> >>>> let's > >> >>>> keep in mind that that the index format would not be backwards > >> >>>> compatible. > >> >>>> > >> >>>> > >> >>>> > >> >>>> > By the way the actual storage (say via Hibernate ORM, or > >> >>>> > Infinispan) > >> >>>> > does not need to store in radian, so we don't need to do a > >> >>>> > conversion when > >> >>>> > reading an entity. > >> >>>> > >> >>>> Right, another reason to index only in whatever format makes > querying > >> >>>> more efficient. > >> >>>> > >> >>>> -- Sanne > >> >>>> > >> >>>> > >> >>>> > > >> >>>> > On 3 mai 2012, at 10:45, Sanne Grinovero wrote: > >> >>>> > > >> >>>> > > The reason for my comment is that the code is doing a > conversion > >> >>>> > > to > >> >>>> > > radians in the DistanceFilter, which needs to be extremely > >> >>>> > > efficient > >> >>>> > > as it's not only applied on the resultset but potentially on > the > >> >>>> > > whole > >> >>>> > > corpus of all Documents in the index. > >> >>>> > > So even if it's true that conversion would be needed on the > final > >> >>>> > > results, we always expect people to retrieve only a limited > >> >>>> > > amount > >> >>>> > > of > >> >>>> > > entities (like with pagination), while the index might need to > >> >>>> > > perform > >> >>>> > > this computation millions of times per query. > >> >>>> > > > >> >>>> > > If I look at the complexity of Point.getDistanceTo(double, > >> >>>> > > double), > >> >>>> > > I > >> >>>> > > get a feeling that that method will hardly provide speedy > queries > >> >>>> > > because of the complex computations in it - this is just > >> >>>> > > speculation > >> >>>> > > at this point of course, to be sure we'd need to compare them > >> >>>> > > with a > >> >>>> > > large enough dataset, but it seems quite obvious that storing > >> >>>> > > normalized radians should be more efficient as it would avoid a > >> >>>> > > good > >> >>>> > > deal of math to be executed on each Document in the index. > >> >>>> > > > >> >>>> > > Also if we assume people might want to use radians in their > user > >> >>>> > > data > >> >>>> > > (I know some who definitely would never touch decimals for > such a > >> >>>> > > use > >> >>>> > > case), there would be no need at all to convert the end result. > >> >>>> > > > >> >>>> > > Some more thoughts inline: > >> >>>> > > > >> >>>> > > On 3 May 2012 09:12, Nicolas Helleringer > >> >>>> > > <nicolas.hellerin...@gmail.com> wrote: > >> >>>> > >> Hi all, > >> >>>> > >> > >> >>>> > >> Sanne and I have been wondering about the way the spatial > >> >>>> > >> branch/module/functionality for Hibernate Search shall store > its > >> >>>> > >> coordinates in the Lucene index. > >> >>>> > >> > >> >>>> > >> Today it is implemented with decimal degree for : > >> >>>> > >> - easy debugging/readability > >> >>>> > >> - ease of conversion on storage as we want to accept mainly > >> >>>> > >> decimal > >> >>>> > >> degree > >> >>>> > >> from users data > >> >>>> > > > >> >>>> > > Valid points, but consider that "storage" is going to be way > >> >>>> > > slower > >> >>>> > > anyway, and typically you'll process a Document to evaluate it > >> >>>> > > for a > >> >>>> > > hit many many orders of magnitude more frequently than the > times > >> >>>> > > you > >> >>>> > > store it. > >> >>>> > > > >> >>>> > >> > >> >>>> > >> Sanne pointed out that when the search is done there is quite > a > >> >>>> > >> few > >> >>>> > >> conversion to radians for distance calculation and suggested > >> >>>> > >> that > >> >>>> > >> we may > >> >>>> > >> store directly coordinates under their radians form. > >> >>>> > >> > >> >>>> > >> I have tried a patch to implement this and as I was coding it > I > >> >>>> > >> feel that > >> >>>> > >> the code was less readable, in the coordinates normalisation > >> >>>> > >> mainly > >> >>>> > >> and > >> >>>> > >> that there was as many conversion as before. > >> >>>> > >> Conversions had moved from search to import / export of > >> >>>> > >> coordinates > >> >>>> > >> in and > >> >>>> > >> out the spatial module scope to user scope. > >> >>>> > > > >> >>>> > > I'm sure the amount of points in the code in which they are > >> >>>> > > converted > >> >>>> > > won't change. I'm concerned about the cardinality of the > >> >>>> > > collections > >> >>>> > > on which it's applied ;) > >> >>>> > > "Less readable" isn't nice, but we can work on that I guess? > >> >>>> > > > >> >>>> > >> > >> >>>> > >> What the docs does not tell (yet), is that we are waiting for > >> >>>> > >> WGS > >> >>>> > >> 84 (this > >> >>>> > >> is a coordinate system) decimal degree coordinates input, as > >> >>>> > >> these > >> >>>> > >> are > >> >>>> > >> quite a de facto standard (GPS output this way). > >> >>>> > > > >> >>>> > > How does it affect this? > >> >>>> > > > >> >>>> > >> > >> >>>> > >> Today this is not the purpose of Hibernate Search spatial > >> >>>> > >> initiative to > >> >>>> > >> handle projections. There are opensource libs to handle that > on > >> >>>> > >> user side > >> >>>> > >> very well (Proj4j) > >> >>>> > >> > >> >>>> > >> So. The question is : shall we store as radians or decimal > >> >>>> > >> degree ? > >> >>>> > >> > >> >>>> > >> Niko > >> >>>> > >> > >> >>>> > >> P.S : Hope it is clear. If not ask for more. > >> >>>> > > > >> >>>> > > Thanks! > >> >>>> > > Sanne > >> >>>> > > _______________________________________________ > >> >>>> > > hibernate-dev mailing list > >> >>>> > > hibernate-dev@lists.jboss.org > >> >>>> > > https://lists.jboss.org/mailman/listinfo/hibernate-dev > >> >>>> > > >> >>> > >> >>> > >> >>> > >> >> > >> > > > > > > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev