Re: Parallel copy

2022-03-06 Thread Bharath Rupireddy
On Mon, Dec 28, 2020 at 3:14 PM vignesh C wrote: > > Attached is a patch that was used for the same. The patch is written > on top of the parallel copy patch. > The design Amit, Andres & myself voted for that is the leader > identifying the line bound design and sharing

Re: Parallel copy

2020-12-28 Thread vignesh C
e leader doesn't find the line-endings the workers need to wait > > > till the leader fill the entire 64K chunk, OTOH, with current approach > > > the worker can start as soon as leader is able to populate some > > > minimum number of line-endings > > > > You can u

Re: Parallel copy

2020-12-26 Thread vignesh C
On Wed, Dec 23, 2020 at 3:05 PM Hou, Zhijie wrote: > > Hi > > > Yes this optimization can be done, I will handle this in the next patch > > set. > > > > I have a suggestion for the parallel safety-check. > > As designed, The leader does not participate in the insertion of data. > If User use

RE: Parallel copy

2020-12-23 Thread Hou, Zhijie
Hi > Yes this optimization can be done, I will handle this in the next patch > set. > I have a suggestion for the parallel safety-check. As designed, The leader does not participate in the insertion of data. If User use (PARALLEL 1), there is only one worker process which will do the

Re: Parallel copy

2020-12-09 Thread vignesh C
On Mon, Dec 7, 2020 at 3:00 PM Hou, Zhijie wrote: > > > Attached v11 patch has the fix for this, it also includes the changes to > > rebase on top of head. > > Thanks for the explanation. > > I think there is still chances we can know the size. > > +* line_size will be set. Read

RE: Parallel copy

2020-12-07 Thread Hou, Zhijie
> > 4. > > A suggestion for CacheLineInfo. > > > > It use appendBinaryStringXXX to store the line in memory. > > appendBinaryStringXXX will double the str memory when there is no enough > spaces. > > > > How about call enlargeStringInfo in advance, if we already know the whole > line size? > > It

RE: Parallel copy

2020-11-19 Thread Hou, Zhijie
Hi Vignesh, I took a look at the v10 patch set. Here are some comments: 1. +/* + * CheckExprParallelSafety + * + * Determine if where cluase and default expressions are parallel safe & do not + * have volatile expressions, return true if condition satisfies else return + * false. + */ 'cluase'

Re: Parallel copy

2020-11-18 Thread vignesh C
tience to wait > > > it finish). Both worker processes are consuming 100% of CPU. > > > > I had a look over this problem. > > > > the ParallelCopyDataBlock has size limit: > > uint8 skip_bytes; > > chardata[DATA_BLOC

Re: Parallel copy

2020-11-18 Thread vignesh C
1]. The random_string() generates a random string > with ASCII characters, symbols and a couple special characters (\r\n\t). > The intent was to try loading data where a fields may span multiple 64kB > blocks and may contain newlines etc. > > The non-parallel copy works fine, the parallel

Re: Parallel copy

2020-11-18 Thread vignesh C
t; > > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila > > > > wrote: > > > > > > > > > > > > > I have worked to provide a patch for the parallel safety checks. It > > > > checks if parallely copy can be performed, Parallel copy cannot be &

Re: Parallel copy

2020-11-18 Thread vignesh C
On Thu, Oct 29, 2020 at 2:26 PM Daniel Westermann (DWE) wrote: > > On 27/10/2020 15:36, vignesh C wrote: > >> Attached v9 patches have the fixes for the above comments. > > >I did some testing: > > I did some testing as well and have a cosmetic remark: > > postgres=# copy t1 from

Re: Parallel copy

2020-11-18 Thread vignesh C
On Wed, Oct 28, 2020 at 5:36 PM Hou, Zhijie wrote: > > Hi > > I found some issue in v9-0002 > > 1. > + > + elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d", > +write_pos, lineInfo->first_block, > +

Re: Parallel copy

2020-11-18 Thread vignesh C
On Thu, Oct 29, 2020 at 2:20 PM Heikki Linnakangas wrote: > > On 27/10/2020 15:36, vignesh C wrote: > > Attached v9 patches have the fixes for the above comments. > > I did some testing: > > /tmp/longdata.pl: > > #!/usr/bin/perl > # > # Generate three rows: > # foo > #

Re: Parallel copy

2020-11-17 Thread Bharath Rupireddy
creating any contention point inside the parallel copy code. However this is causing another choking point i.e. index insertion if indexes are available on the table, which is out of scope of parallel copy code. We think that it would be good to use spinlock-protected worker write position or an at

Re: Parallel copy

2020-11-13 Thread Amit Kapila
t; > > > > > > > > I have worked to provide a patch for the parallel safety checks. It > > > checks if parallely copy can be performed, Parallel copy cannot be > > > performed for the following a) If relation is temporary table b) If > > > relation is foreign

Re: Parallel copy

2020-11-11 Thread vignesh C
gt; checks if parallely copy can be performed, Parallel copy cannot be > > performed for the following a) If relation is temporary table b) If > > relation is foreign table c) If relation has non parallel safe index > > expressions d) If relation has triggers present whose type is of

Re: Parallel copy

2020-11-10 Thread Amit Kapila
On Tue, Nov 10, 2020 at 7:12 PM vignesh C wrote: > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila wrote: > > > > I have worked to provide a patch for the parallel safety checks. It > checks if parallely copy can be performed, Parallel copy cannot be > performed for the

Re: Parallel copy

2020-11-10 Thread vignesh C
e leader doesn't find the line-endings the workers need to wait > > > till the leader fill the entire 64K chunk, OTOH, with current approach > > > the worker can start as soon as leader is able to populate some > > > minimum number of line-endings > > > > You can u

Re: Parallel copy

2020-11-07 Thread vignesh C
. > > the ParallelCopyDataBlock has size limit: > uint8 skip_bytes; > chardata[DATA_BLOCK_SIZE]; /* data read from file */ > > It seems the input line is so long that the leader process run out of the > Shared memory among parallel copy wor

RE: Parallel copy

2020-11-05 Thread Hou, Zhijie
om file */ It seems the input line is so long that the leader process run out of the Shared memory among parallel copy workers. And the leader process keep waiting free block. For the worker process, it wait util line_state becomes LINE_LEADER_POPULATED, But leader process won't set the line_

Re: Parallel copy

2020-11-03 Thread Heikki Linnakangas
On 03/11/2020 10:59, Amit Kapila wrote: On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas wrote: However, the point of parallel copy is to maximize bandwidth. Okay, but this first-phase (finding the line boundaries) can anyway be not done in parallel and we have seen in some of the initial

Re: Parallel copy

2020-11-03 Thread Amit Kapila
opulate some > > minimum number of line-endings > > You can use a smaller block size. > Sure, but the same problem can happen if the last line in that block is too long and we need to peek into the next block. And then there could be cases where a single line could be greater than 64

Re: Parallel copy

2020-11-01 Thread Heikki Linnakangas
On 02/11/2020 09:10, Heikki Linnakangas wrote: On 02/11/2020 08:14, Amit Kapila wrote: We have discussed both these approaches (a) single producer multiple consumer, and (b) all workers doing the processing as you are saying in the beginning and concluded that (a) is better, see some of the

Re: Parallel copy

2020-11-01 Thread Heikki Linnakangas
You can use a smaller block size. However, the point of parallel copy is to maximize bandwidth. If the workers ever have to sit idle, it means that the bottleneck is in receiving data from the client, i.e. the backend is fast enough, and you can't make the overall COPY finish any faster no matter

Re: Parallel copy

2020-11-01 Thread Amit Kapila
On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas wrote: > > Leader process: > > The leader process is simple. It picks the next FREE buffer, fills it > with raw data from the file, and marks it as FILLED. If no buffers are > FREE, wait. > > Worker process: > > 1. Claim next READY block from

Re: Parallel copy

2020-10-31 Thread Tomas Vondra
On Sat, Oct 31, 2020 at 12:09:32AM +0200, Heikki Linnakangas wrote: On 30/10/2020 22:56, Tomas Vondra wrote: I agree this design looks simpler. I'm a bit worried about serializing the parsing like this, though. It's true the current approach (where the first phase of parsing happens in the

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
On 30/10/2020 22:56, Tomas Vondra wrote: I agree this design looks simpler. I'm a bit worried about serializing the parsing like this, though. It's true the current approach (where the first phase of parsing happens in the leader) has a similar issue, but I think it would be easier to improve

Re: Parallel copy

2020-10-30 Thread Tomas Vondra
On Fri, Oct 30, 2020 at 06:41:41PM +0200, Heikki Linnakangas wrote: On 30/10/2020 18:36, Heikki Linnakangas wrote: I find this design to be very complicated. Why does the line-boundary information need to be in shared memory? I think this would be much simpler if each worker grabbed a

Re: Parallel copy

2020-10-30 Thread Tomas Vondra
ocks and may contain newlines etc. The non-parallel copy works fine, the parallel one fails. I haven't investigated the details, but I guess it gets confused about where a string starts/end, or something like that. [1] https://github.com/tvondra/random regards -- Tomas Vondra http:/

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
be in order. It probably would be faster, or at least not slower, to find all the EOLs in a block in one tight loop, even when parallel copy is not used. Something like the attached. It passes the regression tests, but it's quite incomplete. It's missing handing of "\." as end-of-file mark

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
On 30/10/2020 18:36, Heikki Linnakangas wrote: I find this design to be very complicated. Why does the line-boundary information need to be in shared memory? I think this would be much simpler if each worker grabbed a fixed-size block of raw data, and processed that. In your patch, the leader

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
that it needs to be done ASAP, for a chunk at a time, because that cannot be done in parallel. I think some refactoring in CopyReadLine() and friends would be in order. It probably would be faster, or at least not slower, to find all the EOLs in a block in one tight loop, even when parallel copy

Re: Parallel copy

2020-10-29 Thread Amit Kapila
On Thu, Oct 29, 2020 at 11:45 AM Amit Kapila wrote: > > On Tue, Oct 27, 2020 at 7:06 PM vignesh C wrote: > > > [latest version] > > I think the parallel-safety checks in this patch > (v9-0002-Allow-copy-from-command-to-process-data-from-file) are > incomplete and wrong. > One more point, I have

Re: Parallel copy

2020-10-29 Thread Daniel Westermann (DWE)
On 27/10/2020 15:36, vignesh C wrote: >> Attached v9 patches have the fixes for the above comments. >I did some testing: I did some testing as well and have a cosmetic remark: postgres=# copy t1 from '/var/tmp/aa.txt' with (parallel 10); ERROR: value 10 out of bounds for option

Re: Parallel copy

2020-10-29 Thread Heikki Linnakangas
On 27/10/2020 15:36, vignesh C wrote: Attached v9 patches have the fixes for the above comments. I did some testing: /tmp/longdata.pl: #!/usr/bin/perl # # Generate three rows: # foo # longdatalongdatalongdata... # bar # # The length of the middle row is given as command line arg. #

Re: Parallel copy

2020-10-29 Thread Amit Kapila
volatile functions? It should be checked otherwise as well, no? The similar comment applies to other checks in this function. Also, I don't think there is a need to make this function inline. 2. +/* + * IsParallelCopyAllowed + * + * Check if parallel copy can be allowed. + */ +bool +IsParallelCo

RE: Parallel copy

2020-10-28 Thread Hou, Zhijie
Hi I found some issue in v9-0002 1. + + elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d", +write_pos, lineInfo->first_block, +pg_atomic_read_u32(_blk_ptr->unprocessed_line_parts), +

Re: Parallel copy

2020-10-27 Thread vignesh C
n > > IsParallelCopyAllowed(). This will ensure that in case of Parallel > > Copy when the leader has performed all these checks, the worker won't > > do it again. I also feel that it will make the code look a bit > > cleaner. > > > > Just rewriting above comment to make i

Re: Parallel copy

2020-10-27 Thread vignesh C
ssed as you have suggested but relid need to be passed as we will be setting it to pcdata, modified nworkers as suggested. > -- > > +/* DSM keys for parallel copy. */ > +#define PARALLEL_COPY_KEY_SHARED_INFO 1 > +#define PARALLEL_COPY_KEY_CSTATE 2 > +#defin

Re: Parallel copy

2020-10-27 Thread vignesh C
On Wed, Oct 21, 2020 at 3:50 PM Amit Kapila wrote: > > On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy > wrote: > > > > > > 9. Instead of calling CopyStringToSharedMemory() for each string > > variable, can't we just create a linked list of all the strings that > > need to be copied into shm

Re: Parallel copy

2020-10-27 Thread vignesh C
data block, what is the starting offset in the data block, what is the line size, this information will be present in ParallelCopyLineBoundary. Like you said, each worker processes WORKER_CHUNK_COUNT 64 lines at a time. Performance test results run for parallel copy are avai

Re: Parallel copy

2020-10-27 Thread vignesh C
On Wed, Oct 21, 2020 at 4:20 PM Bharath Rupireddy wrote: > > On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy > wrote: > > > > 17. Remove extra lines after #define IsHeaderLine() > > (cstate->header_line && cstate->cur_lineno == 1) in copy.h > > > > I missed one comment: > > 18. I think we

Re: Parallel copy

2020-10-23 Thread Ashutosh Sharma
opy(cstate->nworkers, cstate, stmt->attlist, > +relid); > > Do we need to pass cstate->nworkers and relid to BeginParallelCopy() > function when we are already passing cstate structure, using which > both of these inform

Re: Parallel copy

2020-10-23 Thread Ashutosh Sharma
hich both of these information can be retrieved ? -- +/* DSM keys for parallel copy. */ +#define PARALLEL_COPY_KEY_SHARED_INFO 1 +#define PARALLEL_COPY_KEY_CSTATE 2 +#define PARALLEL_COPY_WAL_USAGE3 +#define PARALLEL_COPY_BUFFER_USAGE 4 DSM key

Re: Parallel copy

2020-10-23 Thread Heikki Linnakangas
I had a brief look at at this patch. Important work! A couple of first impressions: 1. The split between patches 0002-Framework-for-leader-worker-in-parallel-copy.patch and 0003-Allow-copy-from-command-to-process-data-from-file.patch is quite artificial. All the stuff introduced in the first

Re: Parallel copy

2020-10-21 Thread Bharath Rupireddy
On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy wrote: > > 17. Remove extra lines after #define IsHeaderLine() > (cstate->header_line && cstate->cur_lineno == 1) in copy.h > I missed one comment: 18. I think we need to treat the number of parallel workers as an integer similar to the

Re: Parallel copy

2020-10-21 Thread Amit Kapila
On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy wrote: > > > 9. Instead of calling CopyStringToSharedMemory() for each string > variable, can't we just create a linked list of all the strings that > need to be copied into shm and call CopyStringToSharedMemory() only > once? We could avoid 5

Re: Parallel copy

2020-10-21 Thread Bharath Rupireddy
Hi Vignesh, I took a look at the v8 patch set. Here are some comments: 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo() or PopulateCopyStateInfo()? And also EstimateCstateSize() -- EstimateCStateSize(), PopulateCstateCatalogInfo() -- PopulateCStateCatalogInfo()? 2. Instead

Re: Parallel copy

2020-10-21 Thread vignesh C
and details shared by bharath at [1] > 3) Support of parallel copy for COPY_OLD_FE. It is handled as part of v8 patch shared at [2] > 4) Worker has to hop through all the processed chunks before getting > the chunk which it can process. Open > 5) Handling of Tomas's comments.

Re: Parallel copy

2020-10-20 Thread Bharath Rupireddy
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > > > 2. Do we have tests for toast tables? I think if you implement the > > previous point some existing tests might cover it but I feel we

Re: Parallel copy

2020-10-19 Thread Amit Kapila
On Sun, Oct 18, 2020 at 7:47 AM Hou, Zhijie wrote: > > Hi Vignesh, > > After having a look over the patch, > I have some suggestions for > 0003-Allow-copy-from-command-to-process-data-from-file.patch. > > 1. > > +static uint32 > +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List >

RE: Parallel copy

2020-10-17 Thread Hou, Zhijie
Hi Vignesh, After having a look over the patch, I have some suggestions for 0003-Allow-copy-from-command-to-process-data-from-file.patch. 1. +static uint32 +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist, + char **whereClauseStr,

Re: Parallel copy

2020-10-15 Thread Amit Kapila
| wal_records | wal_fpi | wal_bytes > Sequential Copy | 1116| 0 | 3587669 > Parallel Copy(1 worker) | 1116| 0 | 3587669 > Parallel Copy(4 worker) | 1121| 0 | 3587668 > I noticed

Re: Parallel copy

2020-10-14 Thread vignesh C
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow wrote: > > On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote: > > > Attached v6 patch with the fixes. > > > > Hi Vignesh, > > I noticed a couple of issues when scanning the code in the following patch: > >

Re: Parallel copy

2020-10-14 Thread vignesh C
from TPC-H - for 75GB > data set, this largest table is about 64GB once loaded, with another > 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and > NVME storage. > > The COPY duration with varying number of workers (specified using the > parallel COPY option) looks l

Re: Parallel copy

2020-10-14 Thread vignesh C
st/bin/hw_175000.csv' with(format > > csv, delimiter ',', parallel '2') | 0 | 0 | > > 0 | 0 | 0 |0 | > > 1 |35668.402482 | 35668.402482 | 35668.402482 | 35668.402482 > > | 0

Re: Parallel copy

2020-10-14 Thread vignesh C
r > > > thread [1] and performance data shown by Peter that this can't be an > > > independent improvement and rather in some cases it can do harm. Now, > > > if you need it for a parallel-copy path then we can change it > > > specifically to the parallel-copy code pa

Re: Parallel copy

2020-10-14 Thread Bharath Rupireddy
I did performance testing on v7 patch set[1] with custom postgresql.conf[2]. The results are of the triplet form (exec time in sec, number of workers, gain) Use case 1: 10million rows, 5.2GB data, 2 indexes on integer columns, 1 index on text column, binary file (1104.898, 0, 1X), (1112.221, 1,

Re: Parallel copy

2020-10-09 Thread Amit Kapila
new worker code. You can have this as a test-only patch for now and > > > > make sure all existing tests passed with this. > > > > > > > > > > I don't think all the existing copy test cases(except the new test cases > > > added in the parallel copy patch set) would run i

Re: Parallel copy

2020-10-09 Thread Bharath Rupireddy
t; > > I don't think all the existing copy test cases(except the new test cases > > added in the parallel copy patch set) would run inside the parallel worker > > if force_parallel_mode is on. This is because, the parallelism will be > > picked up for parallel copy only if

Re: Parallel copy

2020-10-09 Thread Amit Kapila
egression will be executed via > > new worker code. You can have this as a test-only patch for now and > > make sure all existing tests passed with this. > > > > I don't think all the existing copy test cases(except the new test cases > added in the para

Re: Parallel copy

2020-10-09 Thread Greg Nancarrow
On Fri, Oct 9, 2020 at 5:40 PM Amit Kapila wrote: > > > Looking a bit deeper into this, I'm wondering if in fact your > > EstimateStringSize() and EstimateNodeSize() functions should be using > > BUFFERALIGN() for EACH stored string/node (rather than just calling > > shm_toc_estimate_chunk() once

Re: Parallel copy

2020-10-09 Thread Amit Kapila
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow wrote: > > On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote: > > > Attached v6 patch with the fixes. > > > > Hi Vignesh, > > I noticed a couple of issues when scanning the code in the following patch: > >

Re: Parallel copy

2020-10-08 Thread Amit Kapila
On Thu, Oct 8, 2020 at 12:14 AM vignesh C wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > + */ > > > > +typedef struct ParallelCopyLineBoundary > > > > > > > > Are we doing all this state management to avoid using locks while > > > > processing lines? If so, I think we

Re: Parallel copy

2020-10-08 Thread Amit Kapila
; independent improvement and rather in some cases it can do harm. Now, > > if you need it for a parallel-copy path then we can change it > > specifically to the parallel-copy code path but I don't understand > > your reason completely. > > > > Whenever w

Re: Parallel copy

2020-10-07 Thread vignesh C
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > Few additional comments: > > == > > Some more comments: > > v5-0002-Framewor

Re: Parallel copy

2020-10-07 Thread vignesh C
On Mon, Sep 28, 2020 at 6:37 PM Ashutosh Sharma wrote: > > On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote: > > > > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > > > > > Thanks Ashutosh for your comments. > > > > > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma > > > wrote: > > >

Re: Parallel copy

2020-10-07 Thread Greg Nancarrow
On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote: > Attached v6 patch with the fixes. > Hi Vignesh, I noticed a couple of issues when scanning the code in the following patch: v6-0003-Allow-copy-from-command-to-process-data-from-file.patch In the following code, it will put a junk uint16

Re: Parallel copy

2020-10-07 Thread vignesh C
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote: > > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > > > Thanks Ashutosh for your comments. > > > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma > > wrote: > > > > > > Hi Vignesh, > > > > > > I've spent some time today looking at your

Re: Parallel copy

2020-10-07 Thread vignesh C
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow wrote: > > Hi Vignesh and Bharath, > > Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as > parallel-unsafe. > Can you explain why this is? Yes we don't need to restrict parallelism for RI_TRIGGER_PK cases as we do

Re: Parallel copy

2020-10-03 Thread Amit Kapila
from TPC-H - for 75GB > data set, this largest table is about 64GB once loaded, with another > 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and > NVME storage. > > The COPY duration with varying number of workers (specified using the > parallel COPY option) looks l

Re: Parallel copy

2020-10-02 Thread Tomas Vondra
, with another 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and NVME storage. The COPY duration with varying number of workers (specified using the parallel COPY option) looks like this: workersduration - 01366 11255

Re: Parallel copy

2020-10-01 Thread Amit Kapila
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow wrote: > > Hi Vignesh and Bharath, > > Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as > parallel-unsafe. > Can you explain why this is? > I don't think we need to restrict this case and even if there is

Re: Parallel copy

2020-09-29 Thread vignesh C
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > Few additional comments: > > == > > Some more comments: > Thanks Amit for the comments, I will work on the comments and provide a patch in the next few days.

Re: Parallel copy

2020-09-29 Thread Amit Kapila
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > Few additional comments: > == Some more comments: v5-0002-Framework-for-leader-worker-in-parallel-copy === 1. These values + * help in handover of multipl

Re: Parallel copy

2020-09-29 Thread Greg Nancarrow
Hi Vignesh and Bharath, Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as parallel-unsafe. Can you explain why this is? Regards, Greg Nancarrow Fujitsu Australia

Re: Parallel copy

2020-09-28 Thread Ashutosh Sharma
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote: > > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > > > Thanks Ashutosh for your comments. > > > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma > > wrote: > > > > > > Hi Vignesh, > > > > > > I've spent some time today looking at your

Re: Parallel copy

2020-09-28 Thread Amit Kapila
On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > Thanks Ashutosh for your comments. > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma wrote: > > > > Hi Vignesh, > > > > I've spent some time today looking at your new set of patches and I've > > some thoughts and queries which I would like

Re: Parallel copy

2020-09-28 Thread Amit Kapila
On Wed, Jul 22, 2020 at 7:48 PM vignesh C wrote: > > On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila wrote: > > > > > Review comments: > > === > > > > 0001-Copy-code-readjustment-to-support-parallel-copy > > 1. > > @@ -807,8 +83

Re: Parallel copy

2020-09-24 Thread Ashutosh Sharma
On Thu, Sep 24, 2020 at 3:00 PM Bharath Rupireddy wrote: > > > > > > Have you tested your patch when encoding conversion is needed? If so, > > > could you please point out the email that has the test results. > > > > > > > We have not yet done encoding testing, we will do and post the results > >

Re: Parallel copy

2020-09-24 Thread Bharath Rupireddy
> > > Have you tested your patch when encoding conversion is needed? If so, > > could you please point out the email that has the test results. > > > > We have not yet done encoding testing, we will do and post the results > separately in the coming days. > Hi Ashutosh, I ran the tests ensuring

Re: Parallel copy

2020-09-24 Thread Bharath Rupireddy
0:57:08.927 JST [83335] LOG: totaltableinsertiontime = 17133.251 ms 2020-09-24 10:58:17.420 JST [90905] LOG: totaltableinsertiontime = 15352.753 ms > > Test results show that Parallel COPY with 1 worker is performing > better than normal COPY in the test scenarios run. > Good to k

Re: Parallel copy

2020-09-23 Thread Greg Nancarrow
> processes(except system processes) are running. Is it possible for you to do > the same? > > Please capture and share the timing logs with us. > Yes, I have ensured the system is as idle as possible prior to testing. I have attached the test results obtained after buildin

Re: Parallel copy

2020-09-22 Thread Bharath Rupireddy
onfiguration, 1 worker: 156.299, 153.293, 170.307 > > With Patch, custom configuration, 0 worker: 197.234, 195.866, 196.049 > With Patch, custom configuration, 1 worker: 157.173, 158.287, 157.090 > Hi Greg, If you still observe the issue in your testing environment, I'm attaching

Re: Parallel copy

2020-09-16 Thread Bharath Rupireddy
On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow wrote: > > Fortunately I have been given permission to share the exact table > definition and data I used, so you can check the behaviour and timings > on your own test machine. > Thanks Greg for the script. I ran your test case and I didn't observe

Re: Parallel copy

2020-09-16 Thread Ashutosh Sharma
ompared to the workers then the leader quickly populates one line and > sets the state to LINE_LEADER_POPULATED. State is changed to > LINE_LEADER_POPULATED when we are checking the currr_line_state. > I feel this will not be a problem because, Leader will populate & wait > till some RING

Re: Parallel copy

2020-09-16 Thread Greg Nancarrow
(you have tested, right?). > 3. Was the run performed on release build? For generating the perf data I sent (normal copy vs parallel copy with 1 worker), I used a debug build (-g -O0), as that is needed for generating all the relevant perf data for Postgres code. Previously I ran with a relea

Re: Parallel copy

2020-09-15 Thread Bharath Rupireddy
llowing results from loading a 2GB CSV file (100 > rows, 4 indexes): > > Copy TypeDuration (s) Load factor > === > Normal Copy 190.891 - > > Parallel Copy > (#workers) > 1

Re: Parallel copy

2020-09-07 Thread vignesh C
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow wrote: > > Hi Vignesh, > > >Can you share with me the script you used to generate the data & the ddl of > >the table, so that it will help me check that >scenario you faced the > >>problem. > > Unfortunately I can't directly share it (considered

Re: Parallel copy

2020-09-03 Thread Greg Nancarrow
table definition, multiplying the number of records to produce a 5GB and 9.5GB CSV file. I got the following results: (1) Postgres default settings, 5GB CSV (53 rows): Copy TypeDuration (s) Load factor === Normal Copy 132.1

Re: Parallel copy

2020-09-01 Thread vignesh C
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow wrote: > > Hi Vignesh, > > >Can you share with me the script you used to generate the data & the ddl of > >the table, so that it will help me check that >scenario you faced the > >>problem. > > Unfortunately I can't directly share it (considered

Re: Parallel copy

2020-09-01 Thread Greg Nancarrow
Hi Vignesh, >Can you share with me the script you used to generate the data & the ddl of >the table, so that it will help me check that >scenario you faced the >problem. Unfortunately I can't directly share it (considered company IP), though having said that it's only doing something that is

Re: Parallel copy

2020-08-31 Thread vignesh C
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote: > - Parallel Copy with 1 worker ran slower than normal Copy in a couple > of cases (I did question if allowing 1 worker was useful in my patch > review). Thanks Greg for your review & testing. I had executed various tests with 1

Re: Parallel copy

2020-08-27 Thread Amit Kapila
oughts? > > > > > > Hi Vignesh, > > > > > > I don't really have any further comments on the code, but would like > > > to share some results of some Parallel Copy performance tests I ran > > > (attached). > > > > > > The

Re: Parallel copy

2020-08-27 Thread vignesh C
ther comments on the code, but would like > > to share some results of some Parallel Copy performance tests I ran > > (attached). > > > > The tests loaded a 5GB CSV data file into a 100 column table (of > > different data types). The following were varied as part of t

Re: Parallel copy

2020-08-26 Thread Amit Kapila
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote: > > > I have attached new set of patches with the fixes. > > Thoughts? > > Hi Vignesh, > > I don't really have any further comments on the code, but would like > to share some results of some Parallel Copy perfo

Re: Parallel copy

2020-08-26 Thread Greg Nancarrow
> I have attached new set of patches with the fixes. > Thoughts? Hi Vignesh, I don't really have any further comments on the code, but would like to share some results of some Parallel Copy performance tests I ran (attached). The tests loaded a 5GB CSV data file into a 100 column

Re: Parallel copy

2020-08-16 Thread Greg Nancarrow
nd/or the read line_size ("dataSize") doesn't actually correspond to the read line state? (sorry, still not 100% convinced that the synchronization and checks are safe in all cases) (3) v3-0006-Parallel-Copy-For-Binary-Format-Files.patch >raw_buf is not used in parallel copy, in

Re: Parallel copy

2020-08-11 Thread Greg Nancarrow
any execution problems other than some option validation and associated error messages on boundary cases. One general question that I have: is there a user benefit (over the normal non-parallel COPY) to allowing "COPY ... FROM ... WITH (PARALLEL 1)"? My following comments are broken dow

Re: Parallel copy

2020-08-05 Thread vignesh C
rebased the patch over head & attached. > >> > >I rebased v2-0006-Parallel-Copy-For-Binary-Format-Files.patch. > > > >Putting together all the patches rebased on to the latest commit > >b8fdee7d0ca8bd2165d46fb1468f75571b706a01. Patches from 0001 to 0005 >

  1   2   3   >