Re: [jira] Commented: (SLING-1137) Support hierarchical child node creation from SlingPostServlet

Ian Boston Thu, 08 Oct 2009 01:19:15 -0700


On 7 Oct 2009, at 22:47, Alexander Klimetschek wrote:

On Wed, Oct 7, 2009 at 20:34, Ian Boston <i...@tfd.co.uk> wrote:
I agree, I would like to adopt sensible naming, but we keep onhittingsituations where even with the most reasonable domain prefix we endup with
2K items in a folder and then the update rates go through thefloor, and
contention and un mergable changes fall over. (usually just at theworst
time possible... when load is highest )
In our case we often run out of things to slice before we reach apositionwhere the store works. eg ieb i/ie/ieb gives 64 at level 1 whichgenerateshuge amounts of collision at level2 which again only has 64 makingthe
maximum scale of somewhere around 4096*1024 items assuming a perfect
distribution before the bottom level folders breach 1024 children.For
messaging for instance, I need a store that does about > 255^3 before
colliding, ie 16M *1024. Am I wrong to be choosing jcr as amessage store
to support this use case ?
I think you are really at the edge of scaling here. How many messages
are added per day? I'd think that date + maybe time (if there are more
than 2K per day) should balance it enough, for example. Organizing
messages by date is probably the best way anyway. And I guess they
won't change at all, only new ones are added, which also should reduce
contention to the node with the current time.

I agree that all of these structures help avoid the scaling issues butIMO they miss two points that have been highlighted in our use of Sling.

I am only talking about URL's here *not* the path in JCR unless we areforced to have a 1:1 mapping.

1. The URL space is part of the UI and "owned" by the User, UXDesigner, UI developer.2. Imposing a convention on that URL space for the affordances of theback end causes just the problem that you are concerned about. Now theUI developer needs to know the internals of how to structure thoseURLs to achieve scalability.

BTW, a UI developer does not write Java code. They use the RESTinterfaces, they might write some py, esp or rb.

On 1, our Users, UX Designers and UI developers are demanding URLslike /xxxx/yy where for all instances of yy, yy is unique, and yymight be on of 1-200K and in some instances I know of upto 4M (the 16Gis an edge case but if I break out of the Higher Ed use case there areplenty of examples of URLs where yy is one of billions). There aretwo solid examples /user/eid where eid is the institutional ID and /site/siteid where site ID the name of the Site, eg physics101.

These URLs *must* be speakable human to human. so /site/e4f3-de45-f345-efe4 is not acceptable and /user/i/ie/ieb although just speakable willremind our community of their institutional deployments of AndrewsFile System, IMHO *not* a good thing as for many institutions it hasnot been synonymous with scalability.

On 2. If we have to communicate how to structure the URL to UIdevelopers for storage, then it hardly matters what the scheme is, wehave to communicate it. An algorithm that says formatTime(now,"/{YYYY}/{MM}/{DD}/") is almost as simple as formatSha1(pathInfo,"/{01}/{23}/{45}/{67}") but I cant ask the UI developer to to do either. This isnot to say that they might not decide to structure the URL in asemantic form, and I would encourage them to do so, but they alwayscome back to the case where there is a user generated URL space thatwill have > 10K items at yy.

eg "What! you mean I can just put it at /site/xxx, I have to structurethe url, but that not what the users are saying they want, they wantto be able to decide what the url to their site is and, btw, they dontlike using /site they want /xxx you know like http://www.bbc.co.uk/radio4" (I paraphrase a discussion of a few months ago)


If there is some other categorization of messages, eg. like the
project or group or whatever they belong to, you can put them in the
project's folder and then do the substructure via the dates. If you
give the messages a nodetype + other metadata as properties, you can
search them across projects or months/years.

Sounds like if JCR-642 was fixed, none of this would be an issue?


Not really. First of all it's not just a "fix", it requires a complete
rewrite of the internal persistence architecture in Jackrabbit.
Something for a 3.0 maybe (and there are various ideas how to do that
and also improve other bottlenecks).

But even if Jackrabbit scales with hundred thousands of child nodes
per node, you still have the problem of an unbalanced tree: it will be
hard or not to say impossible to browse that tree for a human - you'd
need a very advanced paging tree view to be able to go through that)
and just doesn't "feel" right. Well, at least to me ;-)

agreed a list of all nodes at yy is explicitly not supported, we usesearch to provide a number of different hierarchies into that space


eg
date organized http://host/messages/yyyy/mm/dd.json
tag organized http://host/tags/sling-dev.json

with a default paging enforced just as any search engine does.

One point here is there are *multiple* views into the information set.

Sorry the message is so long, this is a real, possibly blocking issuefor us.

Ian


Regards,
Alex

--
Alexander Klimetschek
alexander.klimetsc...@day.com

Re: [jira] Commented: (SLING-1137) Support hierarchical child node creation from SlingPostServlet

Reply via email to