lhotari edited a comment on pull request #11343: URL: https://github.com/apache/pulsar/pull/11343#issuecomment-884747001
@jerrypeng > Why not? For the thread runtime I meant that it would have to be for each function instance also in that case and a JVM level unique directory isn't sufficient. It would be possible to have a unique directory for every parallel instance of the function, but the problem that it brings is the **excessive disk space usage**. Each parallel instance would have to have it's own unique directory. Some NAR/JAR files are very large. 50MB isn't uncommon. for example, pulsar-io-debezium-*.nar files are over 80MB (there are many others in this range). With a high parallelism value such as 8, several of hundreds of additional MBs of disk space might be required if the solution would be based on a unique directory per function instance. > Even with your solution, it does not prevent the scenario when two different functions are started by submitting distinct NARs/JARs that is named the same. This PR addresses the case where the NARs or JARs use the same name. The content hash will be used as part of the directory name. The file locking solution prevents race conditions which could happen when parallel functions start up at the same time (very common scenario when parallelism > function workers and using process runtime). Is there some problem with the changes proposed in the PR? I'd like to complete work on this PR before my vacation which starts tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
