I33Buckler commented on issue #1287: URL: https://github.com/apache/age/issues/1287#issuecomment-1783638830
> @I33Buckler Your `create_stop_times` function is **O(N^3)** and that is inside a doubly nested loop that also does **SELECT**s from tables for each level. This can lead to as much as **O(N^5)** (possibly more) runtime for the whole process. > > Even without considering the 2 external loops which could be **O(N^2)**, just the **O(N^3)** for `create_stop_times` is bad enough to make what you are doing only viable for small datasets. > > My suggestion is that you need to rewrite your code to make it more efficient. At this moment, this does not appear to be an Apache AGE issue. > > If you have additional questions or need some guidance, please do ask and we will try to help where possible. Thank you for the response. Very helpful. One of the reasons for looping was that the running the query on the table without looping resulted in it failing with the following messages in the postgresql log. ``` 023-10-26 23:13:45.675 AEDT [44900] LOG: checkpoint starting: time 2023-10-26 23:14:59.530 AEDT [44900] LOG: checkpoint complete: wrote 725 buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=72.978 s, sync=0.509 s, total=73.855 s; sync files=79, longest=0.089 s, average=0.007 s; distance=9734 kB, estimate=23911 kB 2023-10-26 23:16:36.209 AEDT [45561] matthew@tcanalysis LOG: could not receive data from client: Connection reset by peer 2023-10-26 23:16:36.210 AEDT [45558] matthew@tcanalysis LOG: could not receive data from client: Connection reset by peer 2023-10-26 23:16:36.210 AEDT [45559] matthew@tcanalysis LOG: could not receive data from client: Connection reset by peer 2023-10-26 23:16:36.210 AEDT [45572] matthew@tcanalysis LOG: could not receive data from client: Connection reset by peer 2023-10-26 23:16:36.212 AEDT [45560] matthew@tcanalysis LOG: could not receive data from client: Connection reset by peer 2023-10-26 23:16:36.212 AEDT [45562] matthew@tcanalysis LOG: could not receive data from client: Connection reset by peer 2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis LOG: could not send data to client: Broken pipe 2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis STATEMENT: select create_stop_times('transport_network', a.tc_date,a.trip_id,a.stop_id, a.arrival_time,a.arrival_time_sec, a.departure_time,a.departure_time_sec, a.row_num,a.pickup_type,a.drop_off_type) from stop_times a ; 2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis FATAL: connection to client lost 2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis STATEMENT: select create_stop_times('transport_network', a.tc_date,a.trip_id,a.stop_id, a.arrival_time,a.arrival_time_sec, a.departure_time,a.departure_time_sec, a.row_num,a.pickup_type,a.drop_off_type) from stop_times a ; 2023-10-26 23:18:45.540 AEDT [44900] LOG: checkpoint starting: time ``` I've partitioned the table by tc_date processing by looping through each tc_date which hopefully will improve the timing. The result is 23 separate steps for each tc_date. Hoping that my estimated total time for processing the entire table of 10 hours to be within reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org