...continuing on the mailing list; I think this exceeds what an issue
tracker is good for.
angela (JIRA) wrote:
angela commented on JCR-1405:
-----------------------------
julian:
if you cant determine the childinfos upon creating the nodeinfo you should (as
stated by the javadoc) simply return null,
Trying to determine the child infos in practice means asking the
underlying storage for them. If there are 1000 children, and I get an
internal error for the last, I would then have to return null. Which
means that JCR2SPI asks again, using RepositoryService. Not good.
The only case where this new method would actually help is where the set
of child node names is known in advance, such as for nodes of type
nt:file. It's nice to be able to optimize those, but not sufficient.
We started the discussion because of the horrific performance of JCR2SPI
for large collections (where it currently reaches something around 2% of
what my persistence layer can do). Are we still trying to solve this?
if you cant build the nodeinfo due to some exceptional situation you should
throw upon getNodeInfo or getItemInfos
respectively.
the exception with repositoryservice getChildInfo means the same as the one
defined with getNodeInfo or getItemInfos:
- the target node does not exist (any more) in the persistent state
- the persistent layer cant be accessed or something similar.
Well. If the *construction* of the NodeInfo now requires to decide
whether to return child infos or not, then this change doesn't help,
because it doesn't scale for large collections.
I'm not going to retrieve child information unless somebody asks for it
-- and that is when NodeInfo.getChildInfos is called, not when the
NodeInfo is constructed.
therefore i am with marcels explanation how nodeinfo should be created and work.
in addition, if you decide to do some lazy loading of the childinfos upon
NodeInfo.getChildInfos (or upon RepositoryService.getChildInfos) the exception
from my point of view is not raised upon building the iterator but upon
retrieving the next element.... and there you wont be able to throw repository
exception either.
...which may be an indication that a generic Iterator is not the right
thing to use either.
regarding "large":
this is just one obvious example what could be a reason for the implementation
NOT to reveal
the child infos upon NodeInfo.getChildInfos. and the description mentions this
as example.
Again; I started this discussion because of the performance for large
collections. You seem to try to solve an entirely different problem --
do we have any data that indicates that it's worth solving?
How exactly is it better than batch read?
that it states: if the impl is not willing.
Not willing means that the SPI implementations decides upon internal rules
whether the
childinfos are included or not. examples: the impl. decides
- based on the internal structure of the persistent layer in general
- based the cost of retrieving childinfos (given the potential chance of never
being asked for)
See -- that's the problem. It seems to me what we really need is a way
to indicate that the children *will* be needed.
- based on the known characteristics of the target node: e.g. we have folder
and files and other nodes
and we assume that folders will be used for displaying the children so send
it. for any other nodes we dont
See above -- doesn't work in practice.
- based on the simple amount of child nodes if we know that (dont calc if more
than 14)
- based on a implementation specific configuration
that could include nodetypes, number of child nodes, day time,
session.userId, random... whatever
you feel would be appropriate, reasonable or simply a good thing for your
specific store.
the last is pretty much what we discussed for the getItemInfos method for the
batch read. we said
that we cant add a config to the spi interfaces and want to leave that to the
impl because we would
not be able to find something that fits the needs for all potential
implementations.
I do agree that the SPI impl needs to decide on things like that. But we
have to give it sufficient information.
if your store cant retrieve the child info you may
- create your reposervice with a config and leave the decision to someone else
- always calculate the child infos
Again, that doesn't work for the use case we're trying to solve. Or at
least the one I thought we're trying to solve.
- never calculate the child infos
- decide based on the characteristics of the requested node
-...
(see above)
so. i am not in favor of adding exceptions to the new method... at least not
for the reasons presented so far.
angela
I'm in favor to first clearly state what we're trying to do; then create
tests for obtaining measurements; and then re-discuss what needs to be done.
BR, Julian