[
https://issues.apache.org/jira/browse/SOLR-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081382#comment-15081382
]
Jason Gerlowski edited comment on SOLR-8480 at 1/4/16 5:05 PM:
---------------------------------------------------------------
As a disclaimer, I'm new to the Streaming Expression API and TupleStreams in
general. So take what I say with a grain of salt.
The {{getConsumed()}} method might be do-able and useful, but I'm less sure
about a {{getSize()}} method.
As I understand the idea of streaming, nothing really knows how many records
are in the stream. That's one of the main points/advantages. The whole result
set is never fetched all at once, even in the leaves of the TupleStream
hierarchy. All things are possible I suppose, but right now there's nothing
that knows the size of the result-set.
Even assuming that we do know the result-size of underlying searches,
{{getSize}} would be pretty tricky to figure out for some decorator
TupleStreams. For example, consider: {{unique(search(...))}}. How would a
UniqueStream define its size? Even if the underlying search knows how many
results there are total, that doesn't necessarily give UniqueStream any hint at
how many tuples it will output. That depends on what the actual result values
returned by the search(). It can't really be known until all search-result
values have been read/processed by UniqueStream.
It would be nice to have these methods, but it doesn't seem possible in the
current streaming API. Unless I'm missing something, that is. That's
definitely possible, as I'm still new to SOLR. Did you have a particular
method in mind for reporting these sort of stats?
was (Author: gerlowskija):
As a disclaimer, I'm new to the Streaming Expression API and TupleStreams in
general. So take what I say with a grain of salt.
The {{getConsumed()}} method might be do-able and useful, but I'm less sure
about a {{getSize()}} method.
As I understand the idea of streaming, nothing really knows how many records
are in the stream. That's one of the main points/advantages. The whole result
set is never fetched all at once, even in the leaves of the TupleStream
hierarchy. All things are possible I suppose, but right now there's nothing
that knows the size of the result-set.
Even assuming that we do know the result-size of underlying searches,
{{getSize}} would be pretty tricky to figure out for some decorator
TupleStreams. For example, consider: {{unique(search(...))}}. How would a
UniqueStream define its size? Even if the underlying search knows how many
results there are total, that doesn't necessarily give UniqueStream any hint at
how many tuples it will output. That depends on what the actual result values
returned by the search(). It can't really be known until all search-result
values have been read/processed by UniqueStream.
It would be nice to have these methods, but it doesn't seem possible in the
current streaming API. Unless I'm missing something, that is. Did you have a
particular method in mind for reporting these sort of stats?
> Progress info for TupleStream
> -----------------------------
>
> Key: SOLR-8480
> URL: https://issues.apache.org/jira/browse/SOLR-8480
> Project: Solr
> Issue Type: Improvement
> Components: SolrJ
> Reporter: Cao Manh Dat
>
> I suggest adding progress info for TupleStream. It can be very helpful for
> tracking consuming process
> {code}
> public abstract class TupleStream {
> public abstract long getSize();
> public abstract long getConsumed();
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]