corasaurus-hex commented on PR #18457:
URL: https://github.com/apache/datafusion/pull/18457#issuecomment-3507487788
> I do find one thing annoying, but I don't know if it's impacting this PR.
We are calling these `ArrowFileSource` and `ArrowStreamSource`, but both of
them are **file** readers, right? It's just that one is stored in a random
access approach and one is stored in a stream approach. When I see the name
`ArrowStreamSource` I would intuitively think that means some kind of Arrow
stream. Especially if I see the two of those next to each other, my intuition
would be that one is a streaming source and one is a file source. I know you're
reusing the terminology in the [Arrow
spec](https://arrow.apache.org/docs/python/ipc.html), so again I may be
overthinking this.
I find it really annoying, too, but I'm not sure what else to call them.
`ArrowFileFormatSource` and `ArrowStreamFormatSource`? The terminology is so
overloaded here and nothing I've been able to come up with has been
significantly better than the shorter names I'm using in this PR.
I suppose I could make a new `ArrowSource` and `ArrowOpener` that wrap the
two formats and make `ArrowFileSource`/`ArrowStreamSource` and
`ArrowFileOpener`/`ArrowStreamOpener` private to the crate? That might at least
reduce some of the confusion for those not well-versed on the spec/terminology.
I've very open to whichever solution folks feel is best here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]