I need to load a PFA (portable format for analytics) that can be around 30 GB and later process it with hadrian which is the java implementation for PFA's (https://github.com/opendatagroup/hadrian).
I would like to execute this transformation step inside a specific worker of the cluster (since I don't want to load 30 GB on every single worker node). Unfortunately, hadrian cannot be executed in a distributed way. So my question would be if there is a way to do some routing with Flink and execute this particular transformation step using always the same worker node? Perhaps my approach is completely wrong, so if anybody has any suggestions I would be more than happy to hear them:) Thanks