jyswpp opened a new issue, #761:
URL: https://github.com/apache/geaflow/issues/761
I found that the output file from the WCC algorithm contains duplicate data,
and I suspect that intermediate results of the algorithm were also exported.
the wcc sql is:
CREATE GRAPH cc_graph_test (
Vertex nodes (
id bigint ID
),
Edge edges (
srcId bigint SOURCE ID,
targetId bigint DESTINATION ID
)
) WITH (
storeType='memory',
shardCount = 1
);
INSERT INTO cc_graph_test.nodes(id) VALUES
(1),
(2),
(3),
(4),
(5),
(6);
INSERT INTO cc_graph_test.edges VALUES
(1, 2),
(2, 3),
(4, 5),
(5, 6)
;
CREATE TABLE IF NOT EXISTS cc_geaflow_test (
v_id int,
k_value VARCHAR
) WITH (
type='file',
`geaflow.dsl.table.parallelism`= 64,
`geaflow.dsl.source.parallelism` = 64,
`geaflow.file.persistent.config.json` = '{\'*******'}',
`geaflow.dsl.file.path` = '*******',
`geaflow.dsl.column.separator`='\s'
);
USE GRAPH cc_graph_test;
insert into cc_geaflow_test(v_id, k_value)
CALL wcc() YIELD (vid, component)
RETURN vid, component;
output is :
1s1
1s1
2s1
1s1
2s1
3s1
1s1
2s1
3s1
4s4
1s1
2s1
3s1
4s4
5s4
1s1
2s1
3s1
4s4
5s4
6s4
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]