MedAnd commented on PR #7252:
URL: https://github.com/apache/hadoop/pull/7252#issuecomment-3630402071

   Hi @anujmodi / @KeeProMise / @haiyang1987 / @Hexiaoqiao / @aajisaka / 
@ZanderXu,
   
   I’m exploring options for local development where my PySpark Jupyter 
notebooks need to read files from Azurite (both running in local containers) 
and hopefully later run with minimum changes in Azure Synapse Spark in 
production.
   
   My goal is:
   
   - Minimum or zero code changes between local and production environments.
   - Local development using PySpark Jupyter running as a Docker container
   - Local development using Azurite (Microsoft's Azure Blob Storage Emulator) 
running as a Docker container
   - Efficient reading / writing of files from Azurite, prefer setting up and 
using official jars (Java) vs Azure SDK for Python
   
   I understand the Hadoop Azure connector (wasb:// / abfs://) is the 
recommended approach for Azure Synapse, but for local dev with Azurite I’m 
unsure which JAR(s) or configuration to use within PySpark Jupyter Notebooks. 
Given both PySpark Jupyter and Azurite run in separate containers, I need the 
connector to support Azurite's URL which is on the same docker internal network 
as PySpark. I cannot use localhost or 127.0.0.1.
   
   Any guidance on the best practice for this scenario would be greatly 
appreciated. 
   
   **PS. Thanks for your work on these connectors!**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to