jerry-024 commented on code in PR #7902:
URL: https://github.com/apache/paimon/pull/7902#discussion_r3272114303


##########
paimon-python/pypaimon/filesystem/jindo_file_system_handler.py:
##########
@@ -24,17 +24,83 @@
 
 try:
     import pyjindo.fs as jfs
+    import pyjindo.ossfs as jossfs

Review Comment:
   Compatibility regression: `pyjindo.ossfs` should not gate `JINDO_AVAILABLE`.
   
   `pyjindo.ossfs` now shares this try-block with `pyjindo.fs` / 
`pyjindo.util`, so it gates `JINDO_AVAILABLE`. But `JindoFileSystemHandler` 
(the pre-existing PyArrow FileIO path) only needs `pyjindo.fs` + 
`pyjindo.util`. On a `pyjindosdk` version that ships `fs`/`util` but not 
`ossfs`, this `ImportError` flips `JINDO_AVAILABLE` to `False` and silently 
disables the previously-working PyArrow jindo path — a regression for existing 
users.
   
   Suggest splitting into two independent flags:
   
   ```python
   try:
       import pyjindo.fs as jfs
       import pyjindo.util as jutil
       JINDO_AVAILABLE = True
   except ImportError:
       JINDO_AVAILABLE = False
       jfs = None
       jutil = None
   
   try:
       import pyjindo.ossfs as jossfs
       JINDO_OSSFS_AVAILABLE = True
   except ImportError:
       jossfs = None
       JINDO_OSSFS_AVAILABLE = False
   ```
   
   Then:
   - `build_jindo_config` stays gated on `JINDO_AVAILABLE` (it only needs 
`jutil`).
   - `create_jindo_oss_filesystem` and `_use_jindo_oss_backend` should gate on 
`JINDO_AVAILABLE and JINDO_OSSFS_AVAILABLE` — the PVFS jindo backend needs both 
`jossfs` and, via `build_jindo_config`, `jutil`.
   - Export `JINDO_OSSFS_AVAILABLE` so `pvfs.py` can import it.
   - `test_jindo_falls_back_to_ossfs_when_unavailable` should then patch 
`JINDO_OSSFS_AVAILABLE` to exercise the real fallback branch.
   



##########
paimon-python/pypaimon/filesystem/jindo_file_system_handler.py:
##########
@@ -24,17 +24,83 @@
 
 try:
     import pyjindo.fs as jfs
+    import pyjindo.ossfs as jossfs
     import pyjindo.util as jutil
     JINDO_AVAILABLE = True
 except ImportError:
     JINDO_AVAILABLE = False
     jfs = None
+    jossfs = None
     jutil = None
 
 from pypaimon.common.options import Options
 from pypaimon.common.options.config import OssOptions
 
 
+def build_jindo_config(catalog_options: Options):
+    """Build a pyjindo ``Config`` from OSS catalog options.
+
+    Shared by ``JindoFileSystemHandler`` (the PyArrow FileIO path) and
+    ``create_jindo_oss_filesystem`` (the PVFS fsspec path) so both jindo entry
+    points consume exactly the same credential / endpoint options.
+    """
+    if not JINDO_AVAILABLE:
+        raise ImportError("Module pyjindo is not available. Please install 
pyjindosdk.")
+
+    config = jutil.Config()
+
+    access_key_id = catalog_options.get(OssOptions.OSS_ACCESS_KEY_ID)
+    access_key_secret = catalog_options.get(OssOptions.OSS_ACCESS_KEY_SECRET)
+    security_token = catalog_options.get(OssOptions.OSS_SECURITY_TOKEN)
+    endpoint = catalog_options.get(OssOptions.OSS_ENDPOINT)
+    region = catalog_options.get(OssOptions.OSS_REGION)
+
+    if access_key_id:
+        config.set("fs.oss.accessKeyId", access_key_id)
+    if access_key_secret:
+        config.set("fs.oss.accessKeySecret", access_key_secret)
+    if security_token:
+        config.set("fs.oss.securityToken", security_token)
+    if endpoint:
+        endpoint_clean = endpoint.replace('http://', '').replace('https://', 
'')
+        config.set("fs.oss.endpoint", endpoint_clean)
+    if region:
+        config.set("fs.oss.region", region)
+    config.set("fs.oss.user.agent.features", "pypaimon")
+    return config
+
+
+def create_jindo_oss_filesystem(root_uri: str, catalog_options: Options):
+    """Create an fsspec-compatible ``JindoOssFileSystem`` for an OSS bucket.
+
+    ``PaimonVirtualFileSystem`` uses this to back OSS reads/writes with the
+    native JindoSDK instead of ``ossfs``. JindoSDK writes objects via
+    PutObject / multipart upload, so it never issues OSS ``AppendObject`` --
+    the call that fails with ``PositionNotEqualToLength`` (409) on the OSS
+    data-acceleration endpoint when ``ossfs`` flushes a multi-chunk write.
+
+    ``root_uri`` is the bucket root, e.g. ``oss://my-bucket/``; it must carry
+    the bucket so ``JindoOssFileSystem`` can re-attach the ``oss://`` scheme to
+    the bucket-relative paths that ``PaimonVirtualFileSystem`` passes in.
+    """
+    if not JINDO_AVAILABLE:
+        raise ImportError("Module pyjindo is not available. Please install 
pyjindosdk.")
+
+    return jossfs.JindoOssFileSystem(
+        uri=root_uri,
+        config=build_jindo_config(catalog_options),
+        # PaimonVirtualFileSystem owns directory semantics for the virtual FS;
+        # the backing object-store fs must not auto-create dir-marker objects.
+        auto_mkdir=False,
+        # Bypass fsspec's _Cached metaclass instance cache, so the only
+        # reference to this filesystem -- and to its underlying native jindo
+        # connection -- is the PaimonRealStorage cache in PVFS. On token
+        # refresh PVFS replaces that entry and the native resources can be
+        # released, instead of being pinned forever by fsspec's global cache.
+        skip_instance_cache=True,

Review Comment:
   This pins PVFS to a specific `pyjindosdk` API surface: 
`pyjindo.ossfs.JindoOssFileSystem` must exist and accept the `uri` / `config` / 
`auto_mkdir` / `skip_instance_cache` keyword arguments. The PR description 
mentions verification on pyjindo 6.10.2, but nothing declares a minimum 
supported version.
   
   If an older supported `pyjindosdk` version lacks the `ossfs` module or this 
constructor signature, users would hit an `ImportError` / `TypeError` rather 
than the intended clean fallback to `ossfs`.
   
   Could you confirm the minimum `pyjindosdk` version that provides this API, 
and declare it explicitly (packaging metadata / docs / a comment here)? 
Otherwise a version/signature guard around the `ossfs` import and this 
constructor call would be needed.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to