clh-kubernetes commented on issue #1039:
URL: https://github.com/apache/arrow-java/issues/1039#issuecomment-3976084167
@ennuite
Thank you for the quick review. I apologize for the confusion regarding the
API method names in my initial description.
The issue is **not** with the API usage itself, but with the internal
implementation logic triggered when FlightClient.Builder.build() is called with
a Domain Socket location on **Linux**.
Since you are on macOS (which uses KQueue, not Epoll), you cannot reproduce
this locally. The bug is strictly specific to the **Epoll native transport**
path on Linux.
**Root Cause Analysis**
I have decompiled flight-core-17.0.0.jar (16 or 18) and identified the exact
line causing the crash inside the FlightClient.Builder.build() method:
// Internal logic inside FlightClient.Builder.build() or related Epoll
initializer
****
**Code: **
case LocationSchemes.GRPC_DOMAIN_SOCKET:
{
// The implementation is platform-specific, so we have to find
the classes at runtime
builder =
NettyChannelBuilder.forAddress(location.toSocketAddress());
try {
try {
// Linux
builder.channelType(
Class.forName("io.netty.channel.epoll.EpollDomainSocketChannel")
.asSubclass(ServerChannel.class));
}
****
// ❌ BUG: This cast is invalid.
// EpollDomainSocketChannel implements 'Channel' (Client), NOT
'ServerChannel'.
Class<? extends ServerChannel> serverClazz =
clazz.asSubclass(ServerChannel.class);
**Why it fails:**
EpollDomainSocketChannel is a **client-side** channel class.
* ServerChannel is the base interface for **server-side** listeners.
* The correct server class would be EpollServerDomainSocketChannel.
* Attempting to cast the client class to ServerChannel throws a
java.lang.ClassCastException at runtime.
**Corrected Reproduction Steps (Linux Required)**
**Environment:**
* OS: Linux (Ubuntu/CentOS/Alpine) - *Must be Linux to trigger Epoll*
* Java: OpenJDK 11+
* Dependency: {{org.apache.arrow:arrow-flight:17.0.0}} (or 18.0.0)
* Native Lib: {{io.netty:netty-transport-native-epoll}} must be on the
classpath.
**Code:**
import org.apache.arrow.flight.FlightClient;
import org.apache.arrow.flight.Location;
public class ReproUdsBug {
public static void main(String[] args) {
// 1. Use the correct public API
Location loc = Location.forGrpcDomainSocket("/tmp/arrow-test.sock");
BufferAllocator allocator = new RootAllocator();
try {
// 2. The crash happens INSIDE this .build() call
FlightClient client =
FlightClient.builder(allocator,loc).build();
System.out.println("Success: Client created.");
} catch (Exception e) {
System.err.println("Error occurred: " +
e.getClass().getSimpleName());
e.printStackTrace();
// Expected Output:
// java.lang.ClassCastException: class
io.netty.channel.epoll.EpollDomainSocketChannel
// cannot be cast to class io.netty.channel.ServerChannel
}
}
}
{code}
**Summary**
* **API Usage**: The user code is correct (using {{forGrpcDomainSocket}} and
{{builder().build()}}).
* **Trigger**: Calling {{.build()}} with a Domain Socket location on Linux.
* **Defect**: Internal reflection logic incorrectly casts a Client Channel
class to a Server Channel interface.
* **Regression**: Works in v15.0.0, broken in v16.0.0+.
Could you please verify this on a Linux environment or delegate to a team
member with Linux access? The fix should involve correcting the class reference
to {{EpollServerDomainSocketChannel}} (if server logic was intended) or
removing the incorrect cast for client initialization.
Thanks again for your help!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]