...continuing on the mailing list; I think this exceeds what an issue tracker is good for.

angela (JIRA) wrote:
angela commented on JCR-1405:
-----------------------------

julian:

if you cant determine the childinfos upon creating the nodeinfo you should (as 
stated by the javadoc) simply return null,

Trying to determine the child infos in practice means asking the underlying storage for them. If there are 1000 children, and I get an internal error for the last, I would then have to return null. Which means that JCR2SPI asks again, using RepositoryService. Not good.

The only case where this new method would actually help is where the set of child node names is known in advance, such as for nodes of type nt:file. It's nice to be able to optimize those, but not sufficient.

We started the discussion because of the horrific performance of JCR2SPI for large collections (where it currently reaches something around 2% of what my persistence layer can do). Are we still trying to solve this?

if you cant build the nodeinfo due to some exceptional situation you should 
throw upon getNodeInfo or getItemInfos
respectively.

the exception with repositoryservice getChildInfo means the same as the one 
defined with getNodeInfo or getItemInfos:
- the target node does not exist (any more) in the persistent state
- the persistent layer cant be accessed or something similar.

Well. If the *construction* of the NodeInfo now requires to decide whether to return child infos or not, then this change doesn't help, because it doesn't scale for large collections.

I'm not going to retrieve child information unless somebody asks for it -- and that is when NodeInfo.getChildInfos is called, not when the NodeInfo is constructed.

therefore i am with marcels explanation how nodeinfo should be created and work.

in addition, if you decide to do some lazy loading of the childinfos upon 
NodeInfo.getChildInfos (or upon RepositoryService.getChildInfos) the exception 
from my point of view is not raised upon building the iterator but upon 
retrieving the next element.... and there you wont be able to throw repository 
exception either.

...which may be an indication that a generic Iterator is not the right thing to use either.

regarding "large":
this is just one obvious example what could be a reason for the implementation 
NOT to reveal
the child infos upon NodeInfo.getChildInfos. and the description mentions this 
as example.

Again; I started this discussion because of the performance for large collections. You seem to try to solve an entirely different problem -- do we have any data that indicates that it's worth solving?

How exactly is it better than batch read?

that it states: if the impl is not willing.

Not willing means that the SPI implementations decides upon internal rules 
whether the
childinfos are included or not. examples: the impl. decides

- based on the internal structure of the persistent layer in general
- based the cost of retrieving childinfos (given the potential chance of never 
being asked for)

See -- that's the problem. It seems to me what we really need is a way to indicate that the children *will* be needed.

- based on the known characteristics of the target node: e.g. we have folder 
and files and other nodes
  and we assume that folders will be used for displaying the children so send 
it. for any other nodes we dont

See above -- doesn't work in practice.

- based on the simple amount of child nodes if we know that (dont calc if more 
than 14)
- based on a implementation specific configuration
  that could include nodetypes, number of child nodes, day time, 
session.userId, random... whatever
  you feel would be appropriate, reasonable or simply a good thing for your 
specific store.

the last is pretty much what we discussed for the getItemInfos method for the 
batch read. we said
that we cant add a config to the spi interfaces and want to leave that to the 
impl because we would
not be able to find something that fits the needs for all potential 
implementations.

I do agree that the SPI impl needs to decide on things like that. But we have to give it sufficient information.

if your store cant retrieve the child info you may
- create your reposervice with a config and leave the decision to someone else
- always calculate the child infos

Again, that doesn't work for the use case we're trying to solve. Or at least the one I thought we're trying to solve.

- never calculate the child infos
- decide based on the characteristics of the requested node -...
(see above)

so. i am not in favor of adding exceptions to the new method... at least not 
for the reasons presented so far.
angela

I'm in favor to first clearly state what we're trying to do; then create tests for obtaining measurements; and then re-discuss what needs to be done.

BR, Julian

Reply via email to