On 5 March 2014 16:29, Mircea Markus <[email protected]> wrote: > > On Mar 5, 2014, at 3:04 PM, Ales Justin <[email protected]> wrote: > >> But yeah, the moment I start chunking, I would still like to have the >> grouped -- same node. >> Or that doesn't make sense? >> (hence having this discussion ;-) >> >> -Ales >> >> On 05 Mar 2014, at 16:01, Sanne Grinovero <[email protected]> wrote: >> >>> On 5 March 2014 14:54, Ales Justin <[email protected]> wrote: >>>> Why do you chunk at all if you want them stored together? >>>> >>>> I only use chunking if I can't avoid it, to spread large files. >>>> >>>> That's what's GridFS all about -- store very large files. >>>> Hence chunking. >>>> >>>> So you're saying we should know the limit of what we can store on 1 node, >>>> if bigger, spread, therefore no grouping. >>> >>> Yes, but a very conservative approximation would be good enough: you >>> don't need hardware specifications to figure out a reasonable >>> threshold. >>> If I had to make up a number out of thin air, I'd pick something >>> around 10MB: any file below that threshold would not use chunking and >>> be nicely stored together to be retrieved efficiently; beyond that >>> start distributing. > > I don't think that if they are collocated, fetching all the segments to > another node brings better performance. Might be quite the opposite actually, > as having the segments distributed allows fetching them in parallel.
+1, although we don't do parallel fetching yet. My opinion came from an angle of better spreading the data among the nodes: multiple small segments are better than say 2 files of one terabyte each, which would blow up any single node. But this advice obviously depends on the application. If you know that you will have many files, and you want to use other locality tricks (like run an executor to process all content of a file), then you obviously would have an advantage of keeping them on the same node. In that case though I'd question usage of chunking altogether. Sanne > >>> (this figure could probably use some testing if you're looking into >>> performance) >>> >>> Sanne >>> >>>> >>>> -Ales >>>> >>>> On 5 Mar 2014 11:22, "Ales Justin" <[email protected]> wrote: >>>>> >>>>> Just having a discussion with Bela about this. >>>>> >>>>> I guess having "grouping" on GridFS' content would make sense. >>>>> e.g. put all chunks on the same node >>>>> >>>>> Is this doable? >>>>> Afaiu, we would need to have some sort of "similarity" function for >>>>> content's metadata? >>>>> >>>>> -Ales >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> [email protected] >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> [email protected] >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> [email protected] >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> [email protected] >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
