dosubot[bot] commented on issue #617:
URL:
https://github.com/apache/incubator-hugegraph-toolchain/issues/617#issuecomment-2274801815
<!-- Answer -->
The `EOFException` with a "not found" message during a batch insert
operation using the loader tool can be caused by a server-side issue or a
network problem that results in an incomplete response from the server. Here
are steps to resolve this issue:
1. **Check Server Logs**: Look at the server logs to identify any issues
that might be causing the server to close the connection prematurely.
2. **Increase Retry Times**: Increase the number of retry attempts and the
interval between retries in the `LoadOptions` configuration to give the server
more time to recover from transient issues.
3. **Network Stability**: Ensure that the network connection between the
client and the server is stable and has low latency.
4. **Use Docker for Isolation**: If you are using HugeGraph-Loader, consider
using Docker to isolate the environment and reduce potential conflicts. You can
deploy the loader service using `docker run -itd --name loader
hugegraph/loader` and mount your data files into the container.
Here is the relevant code snippet that handles retries in the
`BatchInsertTask` class:
```java
@Override
public void execute() {
int retryCount = 0;
do {
try {
if (this.mapping.updateStrategies().isEmpty()) {
this.insertBatch(this.batch, this.options().checkVertex);
} else {
this.updateBatch(this.batch, this.options().checkVertex);
}
break;
} catch (ClientException e) {
LOG.debug("client exception: {}", e.getMessage());
Throwable cause = e.getCause();
if (cause != null && cause.getMessage() != null) {
if (StringUtils.containsAny(cause.getMessage(),
UNACCEPTABLE_MESSAGES)) {
throw e;
}
}
retryCount = this.waitThenRetry(retryCount, e);
} catch (ServerException e) {
String message = e.getMessage();
LOG.error("server exception: {}", message);
if (UNACCEPTABLE_EXCEPTIONS.contains(e.exception())) {
throw e;
}
if (StringUtils.containsAny(message, UNACCEPTABLE_MESSAGES)) {
throw e;
}
retryCount = this.waitThenRetry(retryCount, e);
}
} while (retryCount > 0 && retryCount <= this.options().retryTimes);
}
```
The `waitThenRetry` method is used to handle the retry logic:
```java
private int waitThenRetry(int retryCount, RuntimeException e) {
LoadOptions options = this.options();
if (options.retryTimes <= 0) {
return retryCount;
}
if (++retryCount > options.retryTimes) {
LOG.error("Batch insert has been retried more than {} times",
options.retryTimes);
throw e;
}
long interval = (1L << retryCount) * options.retryInterval;
LOG.debug("Batch insert will sleep {} seconds then do the {}th retry",
interval, retryCount);
try {
Thread.sleep(interval * 1000L);
} catch (InterruptedException ignored) {
// That's fine, just continue.
}
return retryCount;
}
```
Additionally, since your data source is HDFS, ensure that:
1. The data is organized in a single directory if it is scattered across
multiple files.
2. The supported file formats (TEXT, CSV, JSON) and compressed file types
(GZIP, BZ2, XZ, LZMA, SNAPPY_RAW, SNAPPY_FRAMED, Z, DEFLATE, LZ4_BLOCK,
LZ4_FRAMED, ORC, and PARQUET) are used.
3. If using Kerberos for authentication, ensure that the Kerberos
configuration is correctly set up, including the `krb5.conf` file, principal,
and keytab
<sup>[[1]](https://github.com/apache/incubator-hugegraph-toolchain/blob/master/hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/BatchInsertTask.java)</sup><sup>[[2]](https://github.com/apache/incubator-hugegraph-doc/blob/master/content/en/docs/quickstart/hugegraph-loader.md)</sup><sup>[[3]](https://github.com/apache/incubator-hugegraph-toolchain/blob/master/hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java)</sup><sup>[[4]](https://github.com/apache/incubator-hugegraph-toolchain/blob/master/hugegraph-loader/src/main/java/org/apache/hugegraph/loader/reader/hdfs/HDFSFileReader.java)</sup>.
<!-- Dosu Comment Footer -->
*To continue talking to [Dosu](https://dosu.dev), mention @dosu.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]