This is an automated email from the ASF dual-hosted git repository.
lidongdai pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/seatunnel-website.git
The following commit(s) were added to refs/heads/main by this push:
new d5c444accd2 Update faq.md (#245)
d5c444accd2 is described below
commit d5c444accd2e3dc3863f9601d0137c1658e185fd
Author: David Zollo <[email protected]>
AuthorDate: Sun Jun 25 18:48:01 2023 +0800
Update faq.md (#245)
---
versioned_docs/version-2.3.2/faq.md | 177 ------------------------------------
1 file changed, 177 deletions(-)
diff --git a/versioned_docs/version-2.3.2/faq.md
b/versioned_docs/version-2.3.2/faq.md
index 6903043e087..58945d3b02d 100644
--- a/versioned_docs/version-2.3.2/faq.md
+++ b/versioned_docs/version-2.3.2/faq.md
@@ -1,8 +1,5 @@
# FAQs
-## Why should I install a computing engine like Spark or Flink?
-
-SeaTunnel now uses computing engines such as Spark and Flink to complete
resource scheduling and node communication, so we can focus on the ease of use
of data synchronization and the development of high-performance components. But
this is only temporary.
## I have a question, and I cannot solve it by myself
@@ -61,13 +58,6 @@ your string 1
Refer to:
[lightbend/config#456](https://github.com/lightbend/config/issues/456).
-## Is SeaTunnel supportted in Azkaban, Oozie, DolphinScheduler?
-
-Of course! See the screenshot below:
-
-
-
-
## Does SeaTunnel have a case for configuring multiple sources, such as
configuring elasticsearch and hdfs in source at the same time?
@@ -91,117 +81,6 @@ sink {
}
```
-## Are there any HBase plugins?
-
-There is an hbase input plugin. You can download it from here:
https://github.com/garyelephant/waterdrop-input-hbase .
-
-## How can I use SeaTunnel to write data to Hive?
-
-```
-env {
- spark.sql.catalogImplementation = "hive"
- spark.hadoop.hive.exec.dynamic.partition = "true"
- spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
-}
-
-source {
- sql = "insert into ..."
-}
-
-sink {
- // The data has been written to hive through the sql source. This is just
a placeholder, it does not actually work.
- stdout {
- limit = 1
- }
-}
-```
-
-In addition, SeaTunnel has implemented a `Hive` output plugin after version
`1.5.7` in `1.x` branch; in `2.x` branch. The Hive plugin for the Spark engine
has been supported from version `2.0.5`:
https://github.com/apache/seatunnel/issues/910.
-
-## How does SeaTunnel write multiple instances of ClickHouse to achieve load
balancing?
-
-1. Write distributed tables directly (not recommended)
-
-2. Add a proxy or domain name (DNS) in front of multiple instances of
ClickHouse:
-
- ```
- {
- output {
- clickhouse {
- host = "ck-proxy.xx.xx:8123"
- # Local table
- table = "table_name"
- }
- }
- }
- ```
-3. Configure multiple instances in the configuration:
-
- ```
- {
- output {
- clickhouse {
- host = "ck1:8123,ck2:8123,ck3:8123"
- # Local table
- table = "table_name"
- }
- }
- }
- ```
-4. Use cluster mode:
-
- ```
- {
- output {
- clickhouse {
- # Configure only one host
- host = "ck1:8123"
- cluster = "clickhouse_cluster_name"
- # Local table
- table = "table_name"
- }
- }
- }
- ```
-
-## How can I solve OOM when SeaTunnel consumes Kafka?
-
-In most cases, OOM is caused by not having a rate limit for consumption. The
solution is as follows:
-
-For the current limit of Spark consumption of Kafka:
-
-1. Suppose the number of partitions of Kafka `Topic 1` you consume with
KafkaStream = N.
-
-2. Assuming that the production speed of the message producer (Producer) of
`Topic 1` is K messages/second, the speed of write messages to the partition
must be uniform.
-
-3. Suppose that, after testing, it is found that the processing capacity of
Spark Executor per core per second is M.
-
-The following conclusions can be drawn:
-
-1. If you want to make Spark's consumption of `Topic 1` keep up with its
production speed, then you need `spark.executor.cores` *
`spark.executor.instances` >= K / M
-
-2. When a data delay occurs, if you want the consumption speed not to be too
fast, resulting in spark executor OOM, then you need to configure
`spark.streaming.kafka.maxRatePerPartition` <= (`spark.executor.cores` *
`spark.executor.instances`) * M / N
-
-3. In general, both M and N are determined, and the conclusion can be drawn
from 2: The size of `spark.streaming.kafka.maxRatePerPartition` is positively
correlated with the size of `spark.executor.cores` *
`spark.executor.instances`, and it can be increased while increasing the
resource `maxRatePerPartition` to speed up consumption.
-
-
-
-## How can I solve the Error `Exception in thread "main"
java.lang.NoSuchFieldError: INSTANCE`?
-
-The reason is that the version of httpclient.jar that comes with the CDH
version of Spark is lower, and The httpclient version that ClickHouse JDBC is
based on is 4.5.2, and the package versions conflict. The solution is to
replace the jar package that comes with CDH with the httpclient-4.5.2 version.
-
-## The default JDK of my Spark cluster is JDK7. After I install JDK8, how can
I specify that SeaTunnel starts with JDK8?
-
-In SeaTunnel's config file, specify the following configuration:
-
-```shell
-spark {
- ...
- spark.executorEnv.JAVA_HOME="/your/java_8_home/directory"
- spark.yarn.appMasterEnv.JAVA_HOME="/your/java_8_home/directory"
- ...
-}
-```
## How do I specify a different JDK version for SeaTunnel on Yarn?
@@ -224,17 +103,6 @@ For example, if you want to set the JDK version to JDK8,
there are two cases:
If you run in local mode, you need to modify the `start-seatunnel.sh` startup
script. After `spark-submit`, add a parameter `--driver-memory 4g` . Under
normal circumstances, local mode is not used in the production environment.
Therefore, this parameter generally does not need to be set during On Yarn.
See: [Application
Properties](https://spark.apache.org/docs/latest/configuration.html#application-properties)
for details.
-## Where can I place self-written plugins or third-party jdbc.jars to be
loaded by SeaTunnel?
-
-Place the Jar package under the specified structure of the plugins directory:
-
-```bash
-cd SeaTunnel
-mkdir -p plugins/my_plugins/lib
-cp third-part.jar plugins/my_plugins/lib
-```
-
-`my_plugins` can be any string.
## How do I configure logging-related parameters in SeaTunnel-v1(Spark)?
@@ -298,50 +166,5 @@
http://spark.apache.org/docs/latest/configuration.html#configuring-logging
https://medium.com/@iacomini.riccardo/spark-logging-configuration-in-yarn-faf5ba5fdb01
-## Error when writing to ClickHouse: ClassCastException
-
-In SeaTunnel, the data type will not be actively converted. After the Input
reads the data, the corresponding
-Schema. When writing ClickHouse, the field type needs to be strictly matched,
and the mismatch needs to be resolved.
-
-Data conversion can be achieved through the following two plug-ins:
-
-1. Filter Convert plugin
-2. Filter Sql plugin
-
-Detailed data type conversion reference: [ClickHouse Data Type Check
List](https://interestinglab.github.io/seatunnel-docs/#/en/configuration/output-plugins/Clickhouse?id=clickhouse-data-type-check-list)
-
-Refer to issue:[#488](https://github.com/apache/seatunnel/issues/488)
[#382](https://github.com/apache/seatunnel/issues/382).
-
-## How does SeaTunnel access kerberos-authenticated HDFS, YARN, Hive and other
resources?
-
-Please refer to: [#590](https://github.com/apache/seatunnel/issues/590).
-
-## How do I troubleshoot NoClassDefFoundError, ClassNotFoundException and
other issues?
-
-There is a high probability that there are multiple different versions of the
corresponding Jar package class loaded in the Java classpath, because of the
conflict of the load order, not because the Jar is really missing. Modify this
SeaTunnel startup command, adding the following parameters to the spark-submit
submission section, and debug in detail through the output log.
-
-```
-spark-submit --verbose
- ...
- --conf 'spark.driver.extraJavaOptions=-verbose:class'
- --conf 'spark.executor.extraJavaOptions=-verbose:class'
- ...
-```
-
-## How do I use SeaTunnel to synchronize data across HDFS clusters?
-
-Just configure hdfs-site.xml properly. Refer to:
https://www.cnblogs.com/suanec/p/7828139.html.
-
-## I want to learn the source code of SeaTunnel. Where should I start?
-
-SeaTunnel has a completely abstract and structured code implementation, and
many people have chosen SeaTunnel As a way to learn Spark. You can learn the
source code from the main program entry: Seatunnel.java
-
-## When SeaTunnel developers develop their own plugins, do they need to
understand the SeaTunnel code? Should these plugins be integrated into the
SeaTunnel project?
-
-The plugin developed by the developer has nothing to do with the SeaTunnel
project and does not need to include your plugin code.
-
-The plugin can be completely independent from SeaTunnel project, so you can
write it using Java, Scala, Maven, sbt, Gradle, or whatever you want. This is
also the way we recommend developers to develop plugins.
-## When I import a project, the compiler has the exception "class not found
`org.apache.seatunnel.shade.com.typesafe.config.Config`"
-Run `mvn install` first. In the `seatunnel-config/seatunnel-config-base`
subproject, the package `com.typesafe.config` has been relocated to
`org.apache.seatunnel.shade.com.typesafe.config` and installed to the maven
local repository in the subproject `seatunnel-config/seatunnel-config-shade`.