Vladimir Sitnikov created CALCITE-4480:
------------------------------------------
Summary: Make EnumerableDefaults#union a non-blocking operation
Key: CALCITE-4480
URL: https://issues.apache.org/jira/browse/CALCITE-4480
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.26.0
Reporter: Vladimir Sitnikov
Currently, EnumerableDefaults#union buffers all the rows before it returns the
first of them
Pros:
1) Faster iteration in case enumerable is queried multiple times
Cons:
1) The implementation does not work with infinite streams
2) Keeps memory even after iteration is finished
---
An alternative might be something like
{code:java}
public static <TSource> Enumerable<TSource> union(Enumerable<TSource> source0,
Enumerable<TSource> source1) {
Enumerable<TSource> unionAll = concat(source0, source1);
return new AbstractEnumerable<TSource>() {
@Override public Enumerator<TSource> enumerator() {
Set<TSource> set = new HashSet<>();
return EnumerableDefaults.where(unionAll, set::add).enumerator();
}
};
}
{code}
Pros:
1) Supports infinite streams
2) In theory, it could reset hashSet after iteration finishes
Cons:
1) Slower iteration in case enumerable is queried multiple times (hashSet is
rebuilt every time)
2) concat+abstractenumerable might const CPU cycles
--
This message was sent by Atlassian Jira
(v8.3.4#803005)