zhztheplayer commented on issue #5136:
URL:
https://github.com/apache/incubator-gluten/issues/5136#issuecomment-2021919347
The major issue I have found is that the `flatMap` approach would cause
`UnsafeHashedRelation` to produce duplicated rows in my case (TPCDS q14a with
current version of ACBO)
While the `map` approach would cause `LongHashedRelation` to loss rows
(TPCDS q2).
The following fix (the same with #5141) can work but I didn't dive into it
deeply to find the root reason of the inconsistency (maybe related to
`keyIsUnique`? I am not sure).
```scala
private def reconstructRows(relation: HashedRelation):
Iterator[InternalRow] = {
// It seems that LongHashedRelation and UnsafeHashedRelation don't
follow the same
// criteria while getting values from them.
// Should review the internals of this part of code.
relation match {
case relation: LongHashedRelation if relation.keyIsUnique =>
relation.keys().map(k => relation.getValue(k))
case relation: LongHashedRelation if !relation.keyIsUnique =>
relation.keys().flatMap(k => relation.get(k))
case other => other.valuesWithKeyIndex().map(_.getValue)
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]