I've created a new patch which hold's an in-memory object of JSONObjects.
It adds the results to the object as it scans the children.

If there are too many children it responds with the following:
Status: 300
Header Content-Type: application/json
Body ["/path/to/node.4.json"]

In this case, /path/to/node.4.json is the url which will result in a safe dump of the tree

Otherwise it outputs the object with a nomal 200.

Pushed the patch to: http://codereview.appspot.com/186167

WDYT?

Simon


On 14 Jan 2010, at 10:02, Dominik Süß wrote:

Sure, this would require no direct streaming but buffering the result. But this would always be mandatory if the response code should vary from the determinded data. My idea was to collect data (prepare the response, not
writing in the response) and check for depth in the same run.

For HTTP this would be a 302 since the data may change at any time and
should always be accessed by the original URI.

Dominik

On Thu, Jan 14, 2010 at 10:24 AM, Ian Boston <[email protected]> wrote:

I think, if you wanted to modify the response which is being streamed and may have already been committed then you would need to know in advance.
However if the indication that the results were truncated was in the
response itself (ie add a continuation URL) then a single pass is probably
all thats required.

The only downside of a continuation URL in the response is it changes the
HTTP API.

Ian

On 14 Jan 2010, at 09:19, Dominik Süß wrote:

Why would this require two runs.
The data of depth could be collected while iterating through the levels.
It
would require some adjustments like prefetching for the current level to
check if the result would get too big but appending this list to the
other
list should be a cheap operation.

Best regards
Dominik

On Wed, Jan 13, 2010 at 9:54 AM, Felix Meschberger <[email protected]
wrote:

Hi,

First off, I think scanning the (sub-)tree twice (once for checking,
once for sending) is not a good idea performance-wise anyway.

Put this aside, the check-part could scan breath-first, keeping record of the number items visited after each level. As soon as the threshhold
as been reached, the maximum supported level is known.

The rendering-part can then use the level to actually render the data,
which is done in a depth-first manner.

We might be able to combine the two approaches, by building an in- memory representation of the JSON data (JSONObject) and when the threshold has
been reached, just serialize the JSONObject.

Regards
Felix

On 13.01.2010 00:51, Simon Gaeremynck wrote:
Ok,

Is the following approach better?

Consider node.10.json

Check if the response will contain more than 200 nodes
If so, proceed with the way it is now and send the resources along with
a 200 response code.
If it is not,
Check if node.0.json results in a set bigger then 200 nodes.
If not check node.1.json, then node.2.json, ...
Basically, keep increasing the level until the number of resources is
bigger then 200.
This would give the highest recursion level you can request.
The server would then respond with a 300 and (I think?) a header
'Location' with the highest level.

The thing off course is that you would have to loop over all those
nodes
again and again.
Jackrabbit will have caches for those nodes but I'm not really sure
what
the impact on performance would be.


Simon


On 12 Jan 2010, at 00:53, Roy T. Fielding wrote:

On Jan 11, 2010, at 10:01 AM, Simon Gaeremynck wrote:

Yes, I guess that could work.

But then you can still do node.1000000.json which results in the same
thing.

I took the liberty to write a patch which checks the amount of
resources will be in the result.
If the result is bigger than a pre-defined OSGi property (ex: 200
resources) it will send a 206
partial content with the dump of 200 resources and will ignore the
rest.

It can be found at http://codereview.appspot.com/186072

Simon

Sorry, that would violate HTTP.  Consider what impact it has on
caching by intermediaries.

Generally speaking, treating HTTP as if it were a database is a
bad design.  If the server has a limit on responses, then it
should only provide identifiers that remain under that limit
and forbid any identifiers that would imply a larger limit.

An easy way to avoid this is to respond with 300 and an index of
available resources whenever the resource being requested would be too
big.
The client can then retrieve the individual (smaller) resources from
that index.

....Roy







Reply via email to