I33Buckler commented on issue #1287:
URL: https://github.com/apache/age/issues/1287#issuecomment-1783638830

   > @I33Buckler Your `create_stop_times` function is **O(N^3)** and that is 
inside a doubly nested loop that also does **SELECT**s from tables for each 
level. This can lead to as much as **O(N^5)** (possibly more) runtime for the 
whole process.
   > 
   > Even without considering the 2 external loops which could be **O(N^2)**, 
just the **O(N^3)** for `create_stop_times` is bad enough to make what you are 
doing only viable for small datasets.
   > 
   > My suggestion is that you need to rewrite your code to make it more 
efficient. At this moment, this does not appear to be an Apache AGE issue.
   > 
   > If you have additional questions or need some guidance, please do ask and 
we will try to help where possible.
   
   Thank you for the response. Very helpful.
   One of the reasons for looping was that the running the query on the table 
without looping resulted in it failing with the following messages in the 
postgresql log.
   
   ```
   023-10-26 23:13:45.675 AEDT [44900] LOG:  checkpoint starting: time
   2023-10-26 23:14:59.530 AEDT [44900] LOG:  checkpoint complete: wrote 725 
buffers (0.1%); 0 WAL file(s) added, 0 removed, 1 recycled; write=72.978 s, 
sync=0.509 s, total=73.855 s; sync files=79, longest=0.089 s, average=0.007 s; 
distance=9734 kB, estimate=23911 kB
   2023-10-26 23:16:36.209 AEDT [45561] matthew@tcanalysis LOG:  could not 
receive data from client: Connection reset by peer
   2023-10-26 23:16:36.210 AEDT [45558] matthew@tcanalysis LOG:  could not 
receive data from client: Connection reset by peer
   2023-10-26 23:16:36.210 AEDT [45559] matthew@tcanalysis LOG:  could not 
receive data from client: Connection reset by peer
   2023-10-26 23:16:36.210 AEDT [45572] matthew@tcanalysis LOG:  could not 
receive data from client: Connection reset by peer
   2023-10-26 23:16:36.212 AEDT [45560] matthew@tcanalysis LOG:  could not 
receive data from client: Connection reset by peer
   2023-10-26 23:16:36.212 AEDT [45562] matthew@tcanalysis LOG:  could not 
receive data from client: Connection reset by peer
   2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis LOG:  could not send 
data to client: Broken pipe
   2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis STATEMENT:  select 
create_stop_times('transport_network',
                                                                                
a.tc_date,a.trip_id,a.stop_id,
                                                                                
a.arrival_time,a.arrival_time_sec,
                                                                                
a.departure_time,a.departure_time_sec,
                                                                                
a.row_num,a.pickup_type,a.drop_off_type)
        from stop_times a
        ;
   2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis FATAL:  connection 
to client lost
   2023-10-26 23:17:20.903 AEDT [45099] matthew@tcanalysis STATEMENT:  select 
create_stop_times('transport_network',
                                                                                
a.tc_date,a.trip_id,a.stop_id,
                                                                                
a.arrival_time,a.arrival_time_sec,
                                                                                
a.departure_time,a.departure_time_sec,
                                                                                
a.row_num,a.pickup_type,a.drop_off_type)
        from stop_times a
        ;
   2023-10-26 23:18:45.540 AEDT [44900] LOG:  checkpoint starting: time
   ```
   I've partitioned the table by tc_date processing by looping through each 
tc_date which hopefully will improve the timing.
   The result is 23 separate steps for each tc_date. 
   Hoping that my estimated total time for processing the entire table of 10 
hours to be within reason. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@age.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to