(fury-site) branch main updated: docs: translate guide docs (#169)

chaokunyang Sat, 24 Aug 2024 10:00:36 -0700

This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fury-site.git



The following commit(s) were added to refs/heads/main by this push:
     new bbd8741  docs: translate guide docs (#169)
bbd8741 is described below

commit bbd8741d0b5921a2664809198146ba22575ed569
Author: shown <[email protected]>
AuthorDate: Sun Aug 25 01:00:10 2024 +0800

    docs: translate guide docs (#169)
    
    Signed-off-by: yuluo-yx <[email protected]>
    Co-authored-by: Shawn Yang <[email protected]>
---
 docs/guide/scala_guide.md                          |   7 +-
 .../current/guide/row_format_guide.md              | 139 +++++++++++++++++++++
 .../current/guide/scala_guide.md                   | 138 ++++++++++++++++++++
 3 files changed, 281 insertions(+), 3 deletions(-)

diff --git a/docs/guide/scala_guide.md b/docs/guide/scala_guide.md
index 4de2f09..8e8229c 100644
--- a/docs/guide/scala_guide.md
+++ b/docs/guide/scala_guide.md
@@ -40,11 +40,12 @@ fury.register(Class.forName("scala.Enumeration.Val"))
 ```
 
 If you want to avoid such registration, you can disable class registration by 
`FuryBuilder#requireClassRegistration(false)`.
-Note that this option allow to deserialize objects unknown types, more 
flexible but may be insecure if the classes contains malicious code.
 
-And circular references are common in scala, `Reference tracking` should be 
enabled by `FuryBuilder#withRefTracking(true)`. If you don't enable reference 
tracking, [StackOverflowError](https://github.com/apache/fury/issues/1032) may 
happen for some scala versions when serializing scala Enumeration.
+> Note that this option allow to deserialize objects unknown types, more 
flexible but may be insecure if the classes contains malicious code.
 
-Note that fury instance should be shared between multiple serialization, the 
creation of fury instance is not cheap.
+And circular references are common in scala, `Reference tracking` should be 
enabled by `FuryBuilder#withRefTracking(true)`. If you don't enable `Reference 
tracking`, [StackOverflowError](https://github.com/apache/fury/issues/1032) may 
happen for some scala versions when serializing scala Enumeration.
+
+> Note that fury instance should be shared between multiple serialization, the 
creation of fury instance is not cheap.
 
 If you use shared fury instance across multiple threads, you should create 
`ThreadSafeFury` instead by `FuryBuilder#buildThreadSafeFury()` instead.
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/row_format_guide.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/row_format_guide.md
new file mode 100644
index 0000000..632f644
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/row_format_guide.md
@@ -0,0 +1,139 @@
+---
+title: row format 指南
+sidebar_position: 1
+id: row_format_guide
+---
+
+## Row format protocol
+
+### Java
+
+```java
+public class Bar {
+  String f1;
+  List<Long> f2;
+}
+
+public class Foo {
+  int f1;
+  List<Integer> f2;
+  Map<String, Integer> f3;
+  List<Bar> f4;
+}
+
+RowEncoder<Foo> encoder = Encoders.bean(Foo.class);
+Foo foo = new Foo();
+foo.f1 = 10;
+foo.f2 = IntStream.range(0, 1000000).boxed().collect(Collectors.toList());
+foo.f3 = IntStream.range(0, 1000000).boxed().collect(Collectors.toMap(i -> 
"k"+i, i->i));
+List<Bar> bars = new ArrayList<>(1000000);
+for (int i = 0; i < 1000000; i++) {
+  Bar bar = new Bar();
+  bar.f1 = "s"+i;
+  bar.f2 = LongStream.range(0, 10).boxed().collect(Collectors.toList());
+  bars.add(bar);
+}
+foo.f4 = bars;
+// Can be zero-copy read by python
+BinaryRow binaryRow = encoder.toRow(foo);
+// can be data from python
+Foo newFoo = encoder.fromRow(binaryRow);
+// zero-copy read List<Integer> f2
+BinaryArray binaryArray2 = binaryRow.getArray(1);
+// zero-copy read List<Bar> f4
+BinaryArray binaryArray4 = binaryRow.getArray(3);
+// zero-copy read 11th element of `readList<Bar> f4`
+BinaryRow barStruct = binaryArray4.getStruct(10);
+
+// zero-copy read 6th of f2 of 11th element of `readList<Bar> f4`
+barStruct.getArray(1).getInt64(5);
+RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class);
+// deserialize part of data.
+Bar newBar = barEncoder.fromRow(barStruct);
+Bar newBar2 = barEncoder.fromRow(binaryArray4.getStruct(20));
+```
+
+### Python
+
+```python
+@dataclass
+class Bar:
+    f1: str
+    f2: List[pa.int64]
+@dataclass
+class Foo:
+    f1: pa.int32
+    f2: List[pa.int32]
+    f3: Dict[str, pa.int32]
+    f4: List[Bar]
+
+encoder = pyfury.encoder(Foo)
+foo = Foo(f1=10, f2=list(range(1000_000)),
+         f3={f"k{i}": i for i in range(1000_000)},
+         f4=[Bar(f1=f"s{i}", f2=list(range(10))) for i in range(1000_000)])
+binary: bytes = encoder.to_row(foo).to_bytes()
+print(f"start: {datetime.datetime.now()}")
+foo_row = pyfury.RowData(encoder.schema, binary)
+print(foo_row.f2[100000], foo_row.f4[100000].f1, foo_row.f4[200000].f2[5])
+print(f"end: {datetime.datetime.now()}")
+
+binary = pickle.dumps(foo)
+print(f"pickle start: {datetime.datetime.now()}")
+new_foo = pickle.loads(binary)
+print(new_foo.f2[100000], new_foo.f4[100000].f1, new_foo.f4[200000].f2[5])
+print(f"pickle end: {datetime.datetime.now()}")
+```
+
+### Apache Arrow 支持
+
+Apache Fury Format 还支持从 Arrow Table/RecordBatch 自动转换。
+
+Java：
+
+```java
+Schema schema = TypeInference.inferSchema(BeanA.class);
+ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema);
+Encoder<BeanA> encoder = Encoders.rowEncoder(BeanA.class);
+for (int i = 0; i < 10; i++) {
+  BeanA beanA = BeanA.createBeanA(2);
+  arrowWriter.write(encoder.toRow(beanA));
+}
+return arrowWriter.finishAsRecordBatch();
+```
+
+Python：
+
+```python
+import pyfury
+encoder = pyfury.encoder(Foo)
+encoder.to_arrow_record_batch([foo] * 10000)
+encoder.to_arrow_table([foo] * 10000)
+```
+
+C++:
+
+```c++
+std::shared_ptr<ArrowWriter> arrow_writer;
+EXPECT_TRUE(
+    ArrowWriter::Make(schema, ::arrow::default_memory_pool(), &arrow_writer)
+        .ok());
+for (auto &row : rows) {
+  EXPECT_TRUE(arrow_writer->Write(row).ok());
+}
+std::shared_ptr<::arrow::RecordBatch> record_batch;
+EXPECT_TRUE(arrow_writer->Finish(&record_batch).ok());
+EXPECT_TRUE(record_batch->Validate().ok());
+EXPECT_EQ(record_batch->num_columns(), schema->num_fields());
+EXPECT_EQ(record_batch->num_rows(), row_nums);
+```
+
+```java
+Schema schema = TypeInference.inferSchema(BeanA.class);
+ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema);
+Encoder<BeanA> encoder = Encoders.rowEncoder(BeanA.class);
+for (int i = 0; i < 10; i++) {
+  BeanA beanA = BeanA.createBeanA(2);
+  arrowWriter.write(encoder.toRow(beanA));
+}
+return arrowWriter.finishAsRecordBatch();
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/scala_guide.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/scala_guide.md
new file mode 100644
index 0000000..40fac80
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/scala_guide.md
@@ -0,0 +1,138 @@
+---
+title: Scala 序列化指南
+sidebar_position: 4
+id: scala_guide
+---
+
+Apache Fury 支持所有 Scala 对象序列化：
+
+- `case` 支持类序列化；
+- `pojo/bean` 支持类序列化；
+- `object` 支持单例序列化；
+- `collection` 支持序列化；
+- 其他类型（如 `tuple/either` AND BASIC 类型）也都受支持。
+
+Scala 2 和 3 均支持。
+
+## 安装
+
+```sbt
+libraryDependencies += "org.apache.fury" % "fury-core" % "0.7.0"
+```
+
+## Fury 对象创建
+
+当使用 Apache Fury 进行 Scala 序列化时，您应该至少使用以下选项创建 Fury 对象：
+
+```scala
+val fury = Fury.builder()
+  .withScalaOptimizationEnabled(true)
+  .requireClassRegistration(true)
+  .withRefTracking(true)
+  .build()
+```
+
+根据您序列化的对象类型，您可能需要注册一些 Scala 的内部类型：
+
+```scala
+fury.register(Class.forName("scala.collection.generic.DefaultSerializationProxy"))
+fury.register(Class.forName("scala.Enumeration.Val"))
+```
+
+如果要避免此类注册，可以通过禁用类 `FuryBuilder#requireClassRegistration(false)` 来完成。
+
+> 请注意：此选项可以反序列化未知的对象类型，使用更灵活。但如果类包含任何的恶意代码，会有安全风险。
+
+循环引用在 Scala 中很常见，`Reference tracking` 应该由 `FuryBuilder#withRefTracking(true)` 
配置选项开启。如果不启用 `Reference tracking`，则在序列化 Scala Enumeration 时，某些 Scala 版本可能会发生 
[StackOverflowError 错误](https://github.com/apache/fury/issues/1032)。
+
+> 注意：Fury 实例应该在多个序列化之间共享，创建 Fury 实例开销很大，应该尽量复用。
+
+如果您在多个线程中使用共享的 Fury 实例，您应该使用 `ThreadSafeFury` 代替 
`FuryBuilder#buildThreadSafeFury()`。
+
+## 序列化 case 对象
+
+```scala
+case class Person(github: String, age: Int, id: Long)
+val p = Person("https://github.com/chaokunyang";, 18, 1)
+println(fury.deserialize(fury.serialize(p)))
+println(fury.deserializeJavaObject(fury.serializeJavaObject(p)))
+```
+
+## 序列化 pojo
+
+```scala
+class Foo(f1: Int, f2: String) {
+  override def toString: String = s"Foo($f1, $f2)"
+}
+println(fury.deserialize(fury.serialize(Foo(1, "chaokunyang"))))
+```
+
+## 序列化对象单例
+
+```scala
+object singleton {
+}
+val o1 = fury.deserialize(fury.serialize(singleton))
+val o2 = fury.deserialize(fury.serialize(singleton))
+println(o1 == o2)
+```
+
+## 序列化集合
+
+```scala
+val seq = Seq(1,2)
+val list = List("a", "b")
+val map = Map("a" -> 1, "b" -> 2)
+println(fury.deserialize(fury.serialize(seq)))
+println(fury.deserialize(fury.serialize(list)))
+println(fury.deserialize(fury.serialize(map)))
+```
+
+## 序列化元组
+
+```scala
+val tuple = Tuple2(100, 10000L)
+println(fury.deserialize(fury.serialize(tuple)))
+val tuple = Tuple4(100, 10000L, 10000L, "str")
+println(fury.deserialize(fury.serialize(tuple)))
+```
+
+## 序列化枚举
+
+### Scala3 枚举
+
+```scala
+enum Color { case Red, Green, Blue }
+println(fury.deserialize(fury.serialize(Color.Green)))
+```
+
+### Scala2 枚举
+
+```scala
+object ColorEnum extends Enumeration {
+  type ColorEnum = Value
+  val Red, Green, Blue = Value
+}
+println(fury.deserialize(fury.serialize(ColorEnum.Green)))
+```
+
+## 序列化 Option 类型
+
+```scala
+val opt: Option[Long] = Some(100)
+println(fury.deserialize(fury.serialize(opt)))
+val opt1: Option[Long] = None
+println(fury.deserialize(fury.serialize(opt1)))
+```
+
+## 性能
+
+ `pojo/bean/case/object` Scala 对 Apache Fury JIT 的支持很好，性能与 Apache Fury Java 
一样优异。
+
+Scala 集合和泛型不遵循 Java 集合框架，并且未与当前发行版中的 Apache Fury JIT 完全集成。性能不会像 Java 的 Fury 
collections 序列化那么好。
+
+scala 集合的执行将调用 Java 序列化 API 
`writeObject/readObject/writeReplace/readResolve/readObjectNoData/Externalizable`
 和 Fury `ObjectStream` 实现。虽然 
`org.apache.fury.serializer.ObjectStreamSerializer` 比 JDK 
`ObjectOutputStream/ObjectInputStream` 快很多，但它仍然不知道如何使用 Scala 集合泛型。
+
+未来我们计划为 Scala 类型提供更多优化，敬请期待，更多信息请参看 
[#682](https://github.com/apache/fury/issues/682)！
+
+Scala 集合序列化已在 [#1073](https://github.com/apache/fury/pull/1073) 完成 
，如果您想获得更好的性能，请使用 Apache Fury snapshot 版本。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(fury-site) branch main updated: docs: translate guide docs (#169)

Reply via email to