[seatunnel-website] branch main updated: Update faq.md (#245)

lidongdai Sun, 25 Jun 2023 03:48:16 -0700

This is an automated email from the ASF dual-hosted git repository.

lidongdai pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/seatunnel-website.git



The following commit(s) were added to refs/heads/main by this push:
     new d5c444accd2 Update faq.md (#245)
d5c444accd2 is described below

commit d5c444accd2e3dc3863f9601d0137c1658e185fd
Author: David Zollo <[email protected]>
AuthorDate: Sun Jun 25 18:48:01 2023 +0800

    Update faq.md (#245)
---
 versioned_docs/version-2.3.2/faq.md | 177 ------------------------------------
 1 file changed, 177 deletions(-)

diff --git a/versioned_docs/version-2.3.2/faq.md 
b/versioned_docs/version-2.3.2/faq.md
index 6903043e087..58945d3b02d 100644
--- a/versioned_docs/version-2.3.2/faq.md
+++ b/versioned_docs/version-2.3.2/faq.md
@@ -1,8 +1,5 @@
 # FAQs
 
-## Why should I install a computing engine like Spark or Flink?
-
-SeaTunnel now uses computing engines such as Spark and Flink to complete 
resource scheduling and node communication, so we can focus on the ease of use 
of data synchronization and the development of high-performance components. But 
this is only temporary.
 
 ## I have a question, and I cannot solve it by myself
 
@@ -61,13 +58,6 @@ your string 1
 
 Refer to: 
[lightbend/config#456](https://github.com/lightbend/config/issues/456).
 
-## Is SeaTunnel supportted in Azkaban, Oozie, DolphinScheduler?
-
-Of course! See the screenshot below:
-
-![workflow.png](images/workflow.png)
-
-![azkaban.png](images/azkaban.png)
 
 ## Does SeaTunnel have a case for configuring multiple sources, such as 
configuring elasticsearch and hdfs in source at the same time?
 
@@ -91,117 +81,6 @@ sink {
 }
 ```
 
-## Are there any HBase plugins?
-
-There is an hbase input plugin. You can download it from here: 
https://github.com/garyelephant/waterdrop-input-hbase .
-
-## How can I use SeaTunnel to write data to Hive?
-
-```
-env {
-  spark.sql.catalogImplementation = "hive"
-  spark.hadoop.hive.exec.dynamic.partition = "true"
-  spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
-}
-
-source {
-  sql = "insert into ..."
-}
-
-sink {
-    // The data has been written to hive through the sql source. This is just 
a placeholder, it does not actually work.
-    stdout {
-        limit = 1
-    }
-}
-```
-
-In addition, SeaTunnel has implemented a `Hive` output plugin after version 
`1.5.7` in `1.x` branch; in `2.x` branch. The Hive plugin for the Spark engine 
has been supported from version `2.0.5`: 
https://github.com/apache/seatunnel/issues/910.
-
-## How does SeaTunnel write multiple instances of ClickHouse to achieve load 
balancing?
-
-1. Write distributed tables directly (not recommended)
-
-2. Add a proxy or domain name (DNS) in front of multiple instances of 
ClickHouse:
-
-   ```
-   {
-       output {
-           clickhouse {
-               host = "ck-proxy.xx.xx:8123"
-               # Local table
-               table = "table_name"
-           }
-       }
-   }
-   ```
-3. Configure multiple instances in the configuration:
-
-   ```
-   {
-       output {
-           clickhouse {
-               host = "ck1:8123,ck2:8123,ck3:8123"
-               # Local table
-               table = "table_name"
-           }
-       }
-   }
-   ```
-4. Use cluster mode:
-
-   ```
-   {
-       output {
-           clickhouse {
-               # Configure only one host
-               host = "ck1:8123"
-               cluster = "clickhouse_cluster_name"
-               # Local table
-               table = "table_name"
-           }
-       }
-   }
-   ```
-
-## How can I solve OOM when SeaTunnel consumes Kafka?
-
-In most cases, OOM is caused by not having a rate limit for consumption. The 
solution is as follows:
-
-For the current limit of Spark consumption of Kafka:
-
-1. Suppose the number of partitions of Kafka `Topic 1` you consume with 
KafkaStream = N.
-
-2. Assuming that the production speed of the message producer (Producer) of 
`Topic 1` is K messages/second, the speed of write messages to the partition 
must be uniform.
-
-3. Suppose that, after testing, it is found that the processing capacity of 
Spark Executor per core per second is M.
-
-The following conclusions can be drawn:
-
-1. If you want to make Spark's consumption of `Topic 1` keep up with its 
production speed, then you need `spark.executor.cores` * 
`spark.executor.instances` >= K / M
-
-2. When a data delay occurs, if you want the consumption speed not to be too 
fast, resulting in spark executor OOM, then you need to configure 
`spark.streaming.kafka.maxRatePerPartition` <= (`spark.executor.cores` * 
`spark.executor.instances`) * M / N
-
-3. In general, both M and N are determined, and the conclusion can be drawn 
from 2: The size of `spark.streaming.kafka.maxRatePerPartition` is positively 
correlated with the size of `spark.executor.cores` * 
`spark.executor.instances`, and it can be increased while increasing the 
resource `maxRatePerPartition` to speed up consumption.
-
-![kafka](images/kafka.png)
-
-## How can I solve the Error `Exception in thread "main" 
java.lang.NoSuchFieldError: INSTANCE`?
-
-The reason is that the version of httpclient.jar that comes with the CDH 
version of Spark is lower, and The httpclient version that ClickHouse JDBC is 
based on is 4.5.2, and the package versions conflict. The solution is to 
replace the jar package that comes with CDH with the httpclient-4.5.2 version.
-
-## The default JDK of my Spark cluster is JDK7. After I install JDK8, how can 
I specify that SeaTunnel starts with JDK8?
-
-In SeaTunnel's config file, specify the following configuration:
-
-```shell
-spark {
- ...
- spark.executorEnv.JAVA_HOME="/your/java_8_home/directory"
- spark.yarn.appMasterEnv.JAVA_HOME="/your/java_8_home/directory"
- ...
-}
-```
 
 ## How do I specify a different JDK version for SeaTunnel on Yarn?
 
@@ -224,17 +103,6 @@ For example, if you want to set the JDK version to JDK8, 
there are two cases:
 
 If you run in local mode, you need to modify the `start-seatunnel.sh` startup 
script. After `spark-submit`, add a parameter `--driver-memory 4g` . Under 
normal circumstances, local mode is not used in the production environment. 
Therefore, this parameter generally does not need to be set during On Yarn. 
See: [Application 
Properties](https://spark.apache.org/docs/latest/configuration.html#application-properties)
 for details.
 
-## Where can I place self-written plugins or third-party jdbc.jars to be 
loaded by SeaTunnel?
-
-Place the Jar package under the specified structure of the plugins directory:
-
-```bash
-cd SeaTunnel
-mkdir -p plugins/my_plugins/lib
-cp third-part.jar plugins/my_plugins/lib
-```
-
-`my_plugins` can be any string.
 
 ## How do I configure logging-related parameters in SeaTunnel-v1(Spark)?
 
@@ -298,50 +166,5 @@ 
http://spark.apache.org/docs/latest/configuration.html#configuring-logging
 
 
https://medium.com/@iacomini.riccardo/spark-logging-configuration-in-yarn-faf5ba5fdb01
 
-## Error when writing to ClickHouse: ClassCastException
-
-In SeaTunnel, the data type will not be actively converted. After the Input 
reads the data, the corresponding
-Schema. When writing ClickHouse, the field type needs to be strictly matched, 
and the mismatch needs to be resolved.
-
-Data conversion can be achieved through the following two plug-ins:
-
-1. Filter Convert plugin
-2. Filter Sql plugin
-
-Detailed data type conversion reference: [ClickHouse Data Type Check 
List](https://interestinglab.github.io/seatunnel-docs/#/en/configuration/output-plugins/Clickhouse?id=clickhouse-data-type-check-list)
-
-Refer to issue:[#488](https://github.com/apache/seatunnel/issues/488) 
[#382](https://github.com/apache/seatunnel/issues/382).
-
-## How does SeaTunnel access kerberos-authenticated HDFS, YARN, Hive and other 
resources?
-
-Please refer to: [#590](https://github.com/apache/seatunnel/issues/590).
-
-## How do I troubleshoot NoClassDefFoundError, ClassNotFoundException and 
other issues?
-
-There is a high probability that there are multiple different versions of the 
corresponding Jar package class loaded in the Java classpath, because of the 
conflict of the load order, not because the Jar is really missing. Modify this 
SeaTunnel startup command, adding the following parameters to the spark-submit 
submission section, and debug in detail through the output log.
-
-```
-spark-submit --verbose
-    ...
-   --conf 'spark.driver.extraJavaOptions=-verbose:class'
-   --conf 'spark.executor.extraJavaOptions=-verbose:class'
-    ...
-```
-
-## How do I use SeaTunnel to synchronize data across HDFS clusters?
-
-Just configure hdfs-site.xml properly. Refer to: 
https://www.cnblogs.com/suanec/p/7828139.html.
-
-## I want to learn the source code of SeaTunnel. Where should I start?
-
-SeaTunnel has a completely abstract and structured code implementation, and 
many people have chosen SeaTunnel As a way to learn Spark. You can learn the 
source code from the main program entry: Seatunnel.java
-
-## When SeaTunnel developers develop their own plugins, do they need to 
understand the SeaTunnel code? Should these plugins be integrated into the 
SeaTunnel project?
-
-The plugin developed by the developer has nothing to do with the SeaTunnel 
project and does not need to include your plugin code.
-
-The plugin can be completely independent from SeaTunnel project, so you can 
write it using Java, Scala, Maven, sbt, Gradle, or whatever you want. This is 
also the way we recommend developers to develop plugins.
 
-## When I import a project, the compiler has the exception "class not found 
`org.apache.seatunnel.shade.com.typesafe.config.Config`"
 
-Run `mvn install` first. In the `seatunnel-config/seatunnel-config-base` 
subproject, the package `com.typesafe.config` has been relocated to 
`org.apache.seatunnel.shade.com.typesafe.config` and installed to the maven 
local repository in the subproject `seatunnel-config/seatunnel-config-shade`.

[seatunnel-website] branch main updated: Update faq.md (#245)

Reply via email to