Alan Gates
Wed, 14 May 2008 10:33:56 -0700
Alan. pi song wrote:
Alan, On my second thought, union of two incompatible data streams can cause undefined state in downstream operators, resulting in a mix of good output and garbage. This seems to break the rule of least surprise. What do you think? Pi On Wed, May 14, 2008 at 9:06 AM, pi song <[EMAIL PROTECTED]> wrote:Ok, will follow that. On 5/14/08, Alan Gates <[EMAIL PROTECTED]> wrote:I agree that option 3 is the correct course. One note, you say: In case that schemas from all the input ports are not compatible, no problem because we won't process it. How do you mean "won't process it"? We still have to allow a union operation between two non-compatible inputs (otherwise we can only use union when we have schemas). But the resulting union will not have a schema (since the output no longer has a consistent schema). Alan. pi song wrote:Union is an example of bag (relational) operators that can have more than one input. In case that schemas from all the input ports are the same, no problem. In case that schemas from all the input ports are not compatible, no problem because we won't process it. In case that schemas from all the input ports are not the same, but compatible, here comes a problem. Example: C = UNION A,B ; Schema(A) = < Int, Chararray > Schema(B) = < Double, Chararray > The output schema will get resolved to < Double, Chararray >. Here is the problem. The Union operator at the moment doesn't support casting in any layer. In this case if we don't cast it, the binary data of Int will get picked up as Double by the downstream operator!! There are a couple solutions for this:- 1) Implement LOUnion and POUnion to support type casting internally 2) Add casting support in LOUnion operator and let the LogicalToPhysical compiler generates LOForeach for it. 3) Explicitly insert LOForEach to do necessary casting between Union and the problematic input. This is analogous to the way we implement implicit casting for expression operators. 4) Don't support "not same but compatible" case at all. I will do (3) because it makes the most sense to me plus incurs the least impact on other modules. Does anyone have problem with it? Pi