This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fury-site.git
The following commit(s) were added to refs/heads/main by this push:
new bbd8741 docs: translate guide docs (#169)
bbd8741 is described below
commit bbd8741d0b5921a2664809198146ba22575ed569
Author: shown <[email protected]>
AuthorDate: Sun Aug 25 01:00:10 2024 +0800
docs: translate guide docs (#169)
Signed-off-by: yuluo-yx <[email protected]>
Co-authored-by: Shawn Yang <[email protected]>
---
docs/guide/scala_guide.md | 7 +-
.../current/guide/row_format_guide.md | 139 +++++++++++++++++++++
.../current/guide/scala_guide.md | 138 ++++++++++++++++++++
3 files changed, 281 insertions(+), 3 deletions(-)
diff --git a/docs/guide/scala_guide.md b/docs/guide/scala_guide.md
index 4de2f09..8e8229c 100644
--- a/docs/guide/scala_guide.md
+++ b/docs/guide/scala_guide.md
@@ -40,11 +40,12 @@ fury.register(Class.forName("scala.Enumeration.Val"))
```
If you want to avoid such registration, you can disable class registration by
`FuryBuilder#requireClassRegistration(false)`.
-Note that this option allow to deserialize objects unknown types, more
flexible but may be insecure if the classes contains malicious code.
-And circular references are common in scala, `Reference tracking` should be
enabled by `FuryBuilder#withRefTracking(true)`. If you don't enable reference
tracking, [StackOverflowError](https://github.com/apache/fury/issues/1032) may
happen for some scala versions when serializing scala Enumeration.
+> Note that this option allow to deserialize objects unknown types, more
flexible but may be insecure if the classes contains malicious code.
-Note that fury instance should be shared between multiple serialization, the
creation of fury instance is not cheap.
+And circular references are common in scala, `Reference tracking` should be
enabled by `FuryBuilder#withRefTracking(true)`. If you don't enable `Reference
tracking`, [StackOverflowError](https://github.com/apache/fury/issues/1032) may
happen for some scala versions when serializing scala Enumeration.
+
+> Note that fury instance should be shared between multiple serialization, the
creation of fury instance is not cheap.
If you use shared fury instance across multiple threads, you should create
`ThreadSafeFury` instead by `FuryBuilder#buildThreadSafeFury()` instead.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/row_format_guide.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/row_format_guide.md
new file mode 100644
index 0000000..632f644
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/row_format_guide.md
@@ -0,0 +1,139 @@
+---
+title: row format 指南
+sidebar_position: 1
+id: row_format_guide
+---
+
+## Row format protocol
+
+### Java
+
+```java
+public class Bar {
+ String f1;
+ List<Long> f2;
+}
+
+public class Foo {
+ int f1;
+ List<Integer> f2;
+ Map<String, Integer> f3;
+ List<Bar> f4;
+}
+
+RowEncoder<Foo> encoder = Encoders.bean(Foo.class);
+Foo foo = new Foo();
+foo.f1 = 10;
+foo.f2 = IntStream.range(0, 1000000).boxed().collect(Collectors.toList());
+foo.f3 = IntStream.range(0, 1000000).boxed().collect(Collectors.toMap(i ->
"k"+i, i->i));
+List<Bar> bars = new ArrayList<>(1000000);
+for (int i = 0; i < 1000000; i++) {
+ Bar bar = new Bar();
+ bar.f1 = "s"+i;
+ bar.f2 = LongStream.range(0, 10).boxed().collect(Collectors.toList());
+ bars.add(bar);
+}
+foo.f4 = bars;
+// Can be zero-copy read by python
+BinaryRow binaryRow = encoder.toRow(foo);
+// can be data from python
+Foo newFoo = encoder.fromRow(binaryRow);
+// zero-copy read List<Integer> f2
+BinaryArray binaryArray2 = binaryRow.getArray(1);
+// zero-copy read List<Bar> f4
+BinaryArray binaryArray4 = binaryRow.getArray(3);
+// zero-copy read 11th element of `readList<Bar> f4`
+BinaryRow barStruct = binaryArray4.getStruct(10);
+
+// zero-copy read 6th of f2 of 11th element of `readList<Bar> f4`
+barStruct.getArray(1).getInt64(5);
+RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class);
+// deserialize part of data.
+Bar newBar = barEncoder.fromRow(barStruct);
+Bar newBar2 = barEncoder.fromRow(binaryArray4.getStruct(20));
+```
+
+### Python
+
+```python
+@dataclass
+class Bar:
+ f1: str
+ f2: List[pa.int64]
+@dataclass
+class Foo:
+ f1: pa.int32
+ f2: List[pa.int32]
+ f3: Dict[str, pa.int32]
+ f4: List[Bar]
+
+encoder = pyfury.encoder(Foo)
+foo = Foo(f1=10, f2=list(range(1000_000)),
+ f3={f"k{i}": i for i in range(1000_000)},
+ f4=[Bar(f1=f"s{i}", f2=list(range(10))) for i in range(1000_000)])
+binary: bytes = encoder.to_row(foo).to_bytes()
+print(f"start: {datetime.datetime.now()}")
+foo_row = pyfury.RowData(encoder.schema, binary)
+print(foo_row.f2[100000], foo_row.f4[100000].f1, foo_row.f4[200000].f2[5])
+print(f"end: {datetime.datetime.now()}")
+
+binary = pickle.dumps(foo)
+print(f"pickle start: {datetime.datetime.now()}")
+new_foo = pickle.loads(binary)
+print(new_foo.f2[100000], new_foo.f4[100000].f1, new_foo.f4[200000].f2[5])
+print(f"pickle end: {datetime.datetime.now()}")
+```
+
+### Apache Arrow 支持
+
+Apache Fury Format 还支持从 Arrow Table/RecordBatch 自动转换。
+
+Java:
+
+```java
+Schema schema = TypeInference.inferSchema(BeanA.class);
+ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema);
+Encoder<BeanA> encoder = Encoders.rowEncoder(BeanA.class);
+for (int i = 0; i < 10; i++) {
+ BeanA beanA = BeanA.createBeanA(2);
+ arrowWriter.write(encoder.toRow(beanA));
+}
+return arrowWriter.finishAsRecordBatch();
+```
+
+Python:
+
+```python
+import pyfury
+encoder = pyfury.encoder(Foo)
+encoder.to_arrow_record_batch([foo] * 10000)
+encoder.to_arrow_table([foo] * 10000)
+```
+
+C++:
+
+```c++
+std::shared_ptr<ArrowWriter> arrow_writer;
+EXPECT_TRUE(
+ ArrowWriter::Make(schema, ::arrow::default_memory_pool(), &arrow_writer)
+ .ok());
+for (auto &row : rows) {
+ EXPECT_TRUE(arrow_writer->Write(row).ok());
+}
+std::shared_ptr<::arrow::RecordBatch> record_batch;
+EXPECT_TRUE(arrow_writer->Finish(&record_batch).ok());
+EXPECT_TRUE(record_batch->Validate().ok());
+EXPECT_EQ(record_batch->num_columns(), schema->num_fields());
+EXPECT_EQ(record_batch->num_rows(), row_nums);
+```
+
+```java
+Schema schema = TypeInference.inferSchema(BeanA.class);
+ArrowWriter arrowWriter = ArrowUtils.createArrowWriter(schema);
+Encoder<BeanA> encoder = Encoders.rowEncoder(BeanA.class);
+for (int i = 0; i < 10; i++) {
+ BeanA beanA = BeanA.createBeanA(2);
+ arrowWriter.write(encoder.toRow(beanA));
+}
+return arrowWriter.finishAsRecordBatch();
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/scala_guide.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/scala_guide.md
new file mode 100644
index 0000000..40fac80
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/guide/scala_guide.md
@@ -0,0 +1,138 @@
+---
+title: Scala 序列化指南
+sidebar_position: 4
+id: scala_guide
+---
+
+Apache Fury 支持所有 Scala 对象序列化:
+
+- `case` 支持类序列化;
+- `pojo/bean` 支持类序列化;
+- `object` 支持单例序列化;
+- `collection` 支持序列化;
+- 其他类型(如 `tuple/either` AND BASIC 类型)也都受支持。
+
+Scala 2 和 3 均支持。
+
+## 安装
+
+```sbt
+libraryDependencies += "org.apache.fury" % "fury-core" % "0.7.0"
+```
+
+## Fury 对象创建
+
+当使用 Apache Fury 进行 Scala 序列化时,您应该至少使用以下选项创建 Fury 对象:
+
+```scala
+val fury = Fury.builder()
+ .withScalaOptimizationEnabled(true)
+ .requireClassRegistration(true)
+ .withRefTracking(true)
+ .build()
+```
+
+根据您序列化的对象类型,您可能需要注册一些 Scala 的内部类型:
+
+```scala
+fury.register(Class.forName("scala.collection.generic.DefaultSerializationProxy"))
+fury.register(Class.forName("scala.Enumeration.Val"))
+```
+
+如果要避免此类注册,可以通过禁用类 `FuryBuilder#requireClassRegistration(false)` 来完成。
+
+> 请注意:此选项可以反序列化未知的对象类型,使用更灵活。但如果类包含任何的恶意代码,会有安全风险。
+
+循环引用在 Scala 中很常见,`Reference tracking` 应该由 `FuryBuilder#withRefTracking(true)`
配置选项开启。如果不启用 `Reference tracking`,则在序列化 Scala Enumeration 时,某些 Scala 版本可能会发生
[StackOverflowError 错误](https://github.com/apache/fury/issues/1032)。
+
+> 注意:Fury 实例应该在多个序列化之间共享,创建 Fury 实例开销很大,应该尽量复用。
+
+如果您在多个线程中使用共享的 Fury 实例,您应该使用 `ThreadSafeFury` 代替
`FuryBuilder#buildThreadSafeFury()`。
+
+## 序列化 case 对象
+
+```scala
+case class Person(github: String, age: Int, id: Long)
+val p = Person("https://github.com/chaokunyang", 18, 1)
+println(fury.deserialize(fury.serialize(p)))
+println(fury.deserializeJavaObject(fury.serializeJavaObject(p)))
+```
+
+## 序列化 pojo
+
+```scala
+class Foo(f1: Int, f2: String) {
+ override def toString: String = s"Foo($f1, $f2)"
+}
+println(fury.deserialize(fury.serialize(Foo(1, "chaokunyang"))))
+```
+
+## 序列化对象单例
+
+```scala
+object singleton {
+}
+val o1 = fury.deserialize(fury.serialize(singleton))
+val o2 = fury.deserialize(fury.serialize(singleton))
+println(o1 == o2)
+```
+
+## 序列化集合
+
+```scala
+val seq = Seq(1,2)
+val list = List("a", "b")
+val map = Map("a" -> 1, "b" -> 2)
+println(fury.deserialize(fury.serialize(seq)))
+println(fury.deserialize(fury.serialize(list)))
+println(fury.deserialize(fury.serialize(map)))
+```
+
+## 序列化元组
+
+```scala
+val tuple = Tuple2(100, 10000L)
+println(fury.deserialize(fury.serialize(tuple)))
+val tuple = Tuple4(100, 10000L, 10000L, "str")
+println(fury.deserialize(fury.serialize(tuple)))
+```
+
+## 序列化枚举
+
+### Scala3 枚举
+
+```scala
+enum Color { case Red, Green, Blue }
+println(fury.deserialize(fury.serialize(Color.Green)))
+```
+
+### Scala2 枚举
+
+```scala
+object ColorEnum extends Enumeration {
+ type ColorEnum = Value
+ val Red, Green, Blue = Value
+}
+println(fury.deserialize(fury.serialize(ColorEnum.Green)))
+```
+
+## 序列化 Option 类型
+
+```scala
+val opt: Option[Long] = Some(100)
+println(fury.deserialize(fury.serialize(opt)))
+val opt1: Option[Long] = None
+println(fury.deserialize(fury.serialize(opt1)))
+```
+
+## 性能
+
+ `pojo/bean/case/object` Scala 对 Apache Fury JIT 的支持很好,性能与 Apache Fury Java
一样优异。
+
+Scala 集合和泛型不遵循 Java 集合框架,并且未与当前发行版中的 Apache Fury JIT 完全集成。性能不会像 Java 的 Fury
collections 序列化那么好。
+
+scala 集合的执行将调用 Java 序列化 API
`writeObject/readObject/writeReplace/readResolve/readObjectNoData/Externalizable`
和 Fury `ObjectStream` 实现。虽然
`org.apache.fury.serializer.ObjectStreamSerializer` 比 JDK
`ObjectOutputStream/ObjectInputStream` 快很多,但它仍然不知道如何使用 Scala 集合泛型。
+
+未来我们计划为 Scala 类型提供更多优化,敬请期待,更多信息请参看
[#682](https://github.com/apache/fury/issues/682)!
+
+Scala 集合序列化已在 [#1073](https://github.com/apache/fury/pull/1073) 完成
,如果您想获得更好的性能,请使用 Apache Fury snapshot 版本。
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]