corasaurus-hex commented on PR #18457:
URL: https://github.com/apache/datafusion/pull/18457#issuecomment-3507487788

    > I do find one thing annoying, but I don't know if it's impacting this PR. 
We are calling these `ArrowFileSource` and `ArrowStreamSource`, but both of 
them are **file** readers, right? It's just that one is stored in a random 
access approach and one is stored in a stream approach. When I see the name 
`ArrowStreamSource` I would intuitively think that means some kind of Arrow 
stream. Especially if I see the two of those next to each other, my intuition 
would be that one is a streaming source and one is a file source. I know you're 
reusing the terminology in the [Arrow 
spec](https://arrow.apache.org/docs/python/ipc.html), so again I may be 
overthinking this.
   
   I find it really annoying, too, but I'm not sure what else to call them. 
`ArrowFileFormatSource` and `ArrowStreamFormatSource`? The terminology is so 
overloaded here and nothing I've been able to come up with has been 
significantly better than the shorter names I'm using in this PR.
   
   I suppose I could make a new `ArrowSource` and `ArrowOpener` that wrap the 
two formats and make `ArrowFileSource`/`ArrowStreamSource` and 
`ArrowFileOpener`/`ArrowStreamOpener` private to the crate? That might at least 
reduce some of the confusion for those not well-versed on the spec/terminology.
   
   I've very open to whichever solution folks feel is best here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to