Repository: beam Updated Branches: refs/heads/master 1fafaa846 -> b1382969b
Remove Readme files. extensions -> moving to website @ https://github.com/apache/beam-site/pull/237 jdbc/test -> obsolete content, removed. Project: http://git-wip-us.apache.org/repos/asf/beam/repo Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/77712ea2 Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/77712ea2 Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/77712ea2 Branch: refs/heads/master Commit: 77712ea2a1ad5685d587c5a8ae59a547a6930ae4 Parents: 1fafaa8 Author: Ahmet Altay <[email protected]> Authored: Tue May 9 18:21:39 2017 -0700 Committer: Ahmet Altay <[email protected]> Committed: Wed May 10 08:40:10 2017 -0700 ---------------------------------------------------------------------- sdks/java/extensions/join-library/README.md | 42 ------------------------ sdks/java/extensions/sorter/README.md | 42 ------------------------ sdks/java/io/jdbc/src/test/README.md | 32 ------------------ 3 files changed, 116 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam/blob/77712ea2/sdks/java/extensions/join-library/README.md ---------------------------------------------------------------------- diff --git a/sdks/java/extensions/join-library/README.md b/sdks/java/extensions/join-library/README.md deleted file mode 100644 index feee64f..0000000 --- a/sdks/java/extensions/join-library/README.md +++ /dev/null @@ -1,42 +0,0 @@ -<!-- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -Join-library -============ - -Join-library provides inner join, outer left and right join functions to -Apache Beam. The aim is to simplify the most common cases of join to a -simple function call. - -The functions are generic so it supports join of any types supported by -Beam. Input to the join functions are PCollections of Key/Values. Both the -left and right PCollections need the same type for the key. All the join -functions return a Key/Value where Key is the join key and value is -a Key/Value where the key is the left value and right is the value. - -In the cases of outer join, since null cannot be serialized the user have -to provide a value that represent null for that particular use case. - -Example how to use join-library: - - PCollection<KV<String, String>> leftPcollection = ... - PCollection<KV<String, Long>> rightPcollection = ... - - PCollection<KV<String, KV<String, Long>>> joinedPcollection = - Join.innerJoin(leftPcollection, rightPcollection); http://git-wip-us.apache.org/repos/asf/beam/blob/77712ea2/sdks/java/extensions/sorter/README.md ---------------------------------------------------------------------- diff --git a/sdks/java/extensions/sorter/README.md b/sdks/java/extensions/sorter/README.md deleted file mode 100644 index 6ff3dbe..0000000 --- a/sdks/java/extensions/sorter/README.md +++ /dev/null @@ -1,42 +0,0 @@ -<!-- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -#Sorter -This module provides the SortValues transform, which takes a `PCollection<KV<K, Iterable<KV<K2, V>>>>` and produces a `PCollection<KV<K, Iterable<KV<K2, V>>>>` where, for each primary key `K` the paired `Iterable<KV<K2, V>>` has been sorted by the byte encoding of secondary key (`K2`). It will efficiently and scalably sort the iterables, even if they are large (do not fit in memory). - -##Caveats -* This transform performs value-only sorting; the iterable accompanying each key is sorted, but *there is no relationship between different keys*, as Beam does not support any defined relationship between different elements in a PCollection. -* Each `Iterable<KV<K2, V>>` is sorted on a single worker using local memory and disk. This means that `SortValues` may be a performance and/or scalability bottleneck when used in different pipelines. For example, users are discouraged from using `SortValues` on a `PCollection` of a single element to globally sort a large `PCollection`. A (rough) estimate of the number of bytes of disk space utilized if sorting spills to disk is `numRecords * (numSecondaryKeyBytesPerRecord + numValueBytesPerRecord + 16) * 3`. - -##Options -* The user can customize the temporary location used if sorting requires spilling to disk and the maximum amount of memory to use by creating a custom instance of `BufferedExternalSorter.Options` to pass into `SortValues.create`. - -##Using `SortValues` -```java -PCollection<KV<String, KV<String, Integer>>> input = ... - -// Group by primary key, bringing <SecondaryKey, Value> pairs for the same key together. -PCollection<KV<String, Iterable<KV<String, Integer>>>> grouped = - input.apply(GroupByKey.<String, KV<String, Integer>>create()); - -// For every primary key, sort the iterable of <SecondaryKey, Value> pairs by secondary key. -PCollection<KV<String, Iterable<KV<String, Integer>>>> groupedAndSorted = - grouped.apply( - SortValues.<String, String, Integer>create(new BufferedExternalSorter.Options())); -``` http://git-wip-us.apache.org/repos/asf/beam/blob/77712ea2/sdks/java/io/jdbc/src/test/README.md ---------------------------------------------------------------------- diff --git a/sdks/java/io/jdbc/src/test/README.md b/sdks/java/io/jdbc/src/test/README.md deleted file mode 100644 index 5a7ac99..0000000 --- a/sdks/java/io/jdbc/src/test/README.md +++ /dev/null @@ -1,32 +0,0 @@ -<!-- - Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. ---> - -These are instructions for maintaining postgres as needed for Integration Tests (JdbcIOIT). - -You can always ignore these instructions if you have your own postgres cluster to test against. - -Setting up Postgres -------------------- -1. Setup kubectl so it is configured to work with your kubernetes cluster -1. Run the postgres setup script - src/test/resources/kubernetes/setup.sh -1. Do the data loading - create the data store instance by following the instructions in JdbcTestDataSet - -... and your postgres instances are set up! -
