Re: [I] [Question] 数据导入报错 [incubator-hugegraph-toolchain]

via GitHub Wed, 07 Aug 2024 19:05:52 -0700


dosubot[bot] commented on issue #617:
URL: 
https://github.com/apache/incubator-hugegraph-toolchain/issues/617#issuecomment-2274801815


   <!-- Answer -->
   The `EOFException` with a "not found" message during a batch insert 
operation using the loader tool can be caused by a server-side issue or a 
network problem that results in an incomplete response from the server. Here 
are steps to resolve this issue:
   
   1. **Check Server Logs**: Look at the server logs to identify any issues 
that might be causing the server to close the connection prematurely.
   2. **Increase Retry Times**: Increase the number of retry attempts and the 
interval between retries in the `LoadOptions` configuration to give the server 
more time to recover from transient issues.
   3. **Network Stability**: Ensure that the network connection between the 
client and the server is stable and has low latency.
   4. **Use Docker for Isolation**: If you are using HugeGraph-Loader, consider 
using Docker to isolate the environment and reduce potential conflicts. You can 
deploy the loader service using `docker run -itd --name loader 
hugegraph/loader` and mount your data files into the container.
   
   Here is the relevant code snippet that handles retries in the 
`BatchInsertTask` class:
   
   ```java
   @Override
   public void execute() {
       int retryCount = 0;
       do {
           try {
               if (this.mapping.updateStrategies().isEmpty()) {
                   this.insertBatch(this.batch, this.options().checkVertex);
               } else {
                   this.updateBatch(this.batch, this.options().checkVertex);
               }
               break;
           } catch (ClientException e) {
               LOG.debug("client exception: {}", e.getMessage());
               Throwable cause = e.getCause();
               if (cause != null && cause.getMessage() != null) {
                   if (StringUtils.containsAny(cause.getMessage(),
                                               UNACCEPTABLE_MESSAGES)) {
                       throw e;
                   }
               }
               retryCount = this.waitThenRetry(retryCount, e);
           } catch (ServerException e) {
               String message = e.getMessage();
               LOG.error("server exception: {}", message);
               if (UNACCEPTABLE_EXCEPTIONS.contains(e.exception())) {
                   throw e;
               }
               if (StringUtils.containsAny(message, UNACCEPTABLE_MESSAGES)) {
                   throw e;
               }
               retryCount = this.waitThenRetry(retryCount, e);
           }
       } while (retryCount > 0 && retryCount <= this.options().retryTimes);
   }
   ```
   
   The `waitThenRetry` method is used to handle the retry logic:
   
   ```java
   private int waitThenRetry(int retryCount, RuntimeException e) {
       LoadOptions options = this.options();
       if (options.retryTimes <= 0) {
           return retryCount;
       }
   
       if (++retryCount > options.retryTimes) {
           LOG.error("Batch insert has been retried more than {} times",
                     options.retryTimes);
           throw e;
       }
   
       long interval = (1L << retryCount) * options.retryInterval;
       LOG.debug("Batch insert will sleep {} seconds then do the {}th retry",
                 interval, retryCount);
       try {
           Thread.sleep(interval * 1000L);
       } catch (InterruptedException ignored) {
           // That's fine, just continue.
       }
       return retryCount;
   }
   ```
   
   Additionally, since your data source is HDFS, ensure that:
   1. The data is organized in a single directory if it is scattered across 
multiple files.
   2. The supported file formats (TEXT, CSV, JSON) and compressed file types 
(GZIP, BZ2, XZ, LZMA, SNAPPY_RAW, SNAPPY_FRAMED, Z, DEFLATE, LZ4_BLOCK, 
LZ4_FRAMED, ORC, and PARQUET) are used.
   3. If using Kerberos for authentication, ensure that the Kerberos 
configuration is correctly set up, including the `krb5.conf` file, principal, 
and keytab 
<sup>[[1]](https://github.com/apache/incubator-hugegraph-toolchain/blob/master/hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/BatchInsertTask.java)</sup><sup>[[2]](https://github.com/apache/incubator-hugegraph-doc/blob/master/content/en/docs/quickstart/hugegraph-loader.md)</sup><sup>[[3]](https://github.com/apache/incubator-hugegraph-toolchain/blob/master/hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java)</sup><sup>[[4]](https://github.com/apache/incubator-hugegraph-toolchain/blob/master/hugegraph-loader/src/main/java/org/apache/hugegraph/loader/reader/hdfs/HDFSFileReader.java)</sup>.
   
   
   <!-- Dosu Comment Footer -->
   *To continue talking to [Dosu](https://dosu.dev), mention @dosu.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Question] 数据导入报错 [incubator-hugegraph-toolchain]

Reply via email to