I think it's this "know in advance that a tree is going to be big" concept
that we're currently running afoul of.

Let's say that I pretty much "know" that my tree will be small.  So I call
x/y/z.infinity.json.  All well and good.

And in another case, that I "know" that my tree is likely to be colossal, 
so I implement a search rather than an enumeration.  Good again.

But in a third case, I "know" that my tree might be "large-ish", so I
implement a lazy walk in steps as needed (say, for instance, through a tree
control).  But if I can't make a single step, then I'm sunk.

One could argue that "large-ish" and "DOS" aren't really of the same
magnitude, and that we just need to raise the default limit.  In the tree
control instance, even with type-ahead searching it's hard to imagine it 
being useful for 10,000 entries.  But it's clearly useful for 1000 entries 
(I've got a customer to "prove" it ;) -- so maybe we just need to raise the 
default to something like 8K (it's currently 1K).

But I still think limiting to one level is a cleaner solution.  If a single
level has enough nodes to produce a DOS attack, then it seems like you've 
got more worries than x/y/z.1.json.

Jeff.




-----Original Message-----
From: ianbos...@gmail.com [mailto:ianbos...@gmail.com] On Behalf Of Ian Boston
Sent: 02 December 2011 00:14
To: dev@sling.apache.org
Subject: Re: FW: Issue with DOS limitation in infinity.json servlet

On 2 December 2011 10:51, Justin Edelson <jus...@justinedelson.com> wrote:
> Hmmm. Good point. I changed my mind (slightly) - the behavior Jeff is
> describing can be supported, but should be disabled by default.
>
> Ian - to your question, with the default configuration,
> /x/y/z.(anything).json should not output 2M child nodes. IMHO, if you
> as a system operator decide to let clients request all 2M nodes, that
> should be your prerogative to do so via configuration.

Yes, agreed, if you know in advance that a tree is going to be big,
then it should be possible to configure it not to do something nasty
when a x.-1.json comes in. IIRC Sling already has this capability with
ResourceProvider and Resource.listChildren which can be implemented to
return no results.

Perhaps the protection against DOS should be removed, and users of
Sling should be encouraged to use the core Sling API to do the right
thing, just a thought ?

I think the implementation originally came from one of my use cases,
were we were trying to store "billions" of items in a multi level
tree.... we stopped doing that for all sorts of other reasons and used
the Resource.listChildren approach.

>
> In reality, however, if you have that type of structure, I think
> you're more likely to do content discovery by search rather than
> browse. WDYT?

yes very true,
Its interesting/depressing to see how often lists are used as a
substitute for what is really search. The feature request shortly
follows for sorting, and selecting by first letter, and if you watch a
user they use the paging, selecting, sorting in real frustration, when
all they wanted was to be able to search for the item slap bang in the
middle of the result set.

Ian

>
> Justin
>
> On Thu, Dec 1, 2011 at 5:53 PM, Ian Boston <i...@tfd.co.uk> wrote:
>> Hi,
>> Thinking about this some more,
>> Assuming the content system can support this for a moment.
>> /x/y/z has 2M direct child nodes, what does /x/y/z.-1.json respond
>> with? 2M links to those child nodes.
>>
>> Does the system need to support paging, in the same way atom does?
>> eg /x/y/z.-1.json?page=1
>>
>> With Sling built on Jackrabbit this doesn't happen (yet), since
>> Jackrabbit (IIUC) stores child nodes internally as an array of
>> pointers on the parent node, so it may not be a real issue, but Sling
>> on something else may encounter this. In general the solution has been
>> to refuse to list child Resources via the Resource interface, and do
>> something custom with paging enforced, but as soon as paging is
>> introduced, order becomes relevant, and that opens up the validity of
>> ordering a map in json, which IIRC is defined as a bag not a list.
>>
>> Ian
>>
>> On 2 December 2011 08:50, Justin Edelson <jus...@justinedelson.com> wrote:
>>> Hi Jeff,
>>> I'm not sure why you can't just increase the limit if you run into
>>> this problem, but I am not opposed to making this change on principal.
>>>
>>> I'm very intrigued by the idea of a PostProcessor which limits the
>>> number of nodes at a particular point in the hierarchy, but that's not
>>> going to be 100% effective as Sling doesn't "own" the repository per
>>> se.
>>>
>>> Justin
>>>
>>> On Thu, Dec 1, 2011 at 4:26 PM, Jeff Young <j...@adobe.com> wrote:
>>>> The intent behind the limitation seems sound, but the implementation has 
>>>> (to my mind) a slight flaw.
>>>>
>>>> A legitimate client which needs the information could presumably implement 
>>>> its own traversal to descend the tree.  But this only works if the json 
>>>> servlet is always allowed to return at least a depth of 1.  The current 
>>>> implementation limits the depth to 0 if the node in question has more than 
>>>> the limit number of children.
>>>>
>>>> I was discussing this with Alex, who pointed out that the intent was to be 
>>>> defensive.  However, if we really want to limit the *number of children* a 
>>>> node can have, then we ought to do that elsewhere.  Given that a node 
>>>> *does* have a certain number of children, the json servlet needs to at 
>>>> least support the enumeration of said children.
>>>>
>>>> So I'd like to propose that we amend the DOS-protection-algorithm to stop 
>>>> at 1, rather than 0.
>>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Jeff.
>>>>
>>>> (PS: apologies if this gets sent out twice, but I think ezmlm ate the 
>>>> first posting because I hadn't yet cofirmed my subscription so I'm 
>>>> re-sending.)
>>>>
>>>>
>>>>
>>>> Jeff Young | Principal Scientist | Adobe Distinguished Inventor
>>>> Adobe Systems Software Ireland Ltd.
>>>> Registered Office: 4-6 Riverwalk, Citywest Business Campus,
>>>> Saggart, Dublin 24, Ireland   Company No. 344992
>>>> P Please consider your environmental responsibility before printing this 
>>>> e-mail.
>>>>
>>>>
>>>
>

Reply via email to