[sorry to all for not snipping the mail] > D. Bolliger am Montag, 4. September 2006 03:57: > > Geetha Weerasooriya am Sonntag, 3. September 2006 16:22: > > > Dear Mr.Dani, > > > > > > Thank you very much for the reply. I understand that but question is > > > not clear. I will explain my problem little more. > > > > > > Below is the sample of my data set. Actual data file is very much > > > larger.( about 3 GB) > > > > [please don't top post] > > > > Hm, I need still more info (and I hope others as well)... > > > > > Date Time Veh ID Longitude Latitude Speed Odometer > > > Route No > > > distance flag > > > 2003/11/12 8:32:43 10 139.6368501 35.51527949 23.6 > > > 27406416 1 2177 3 > > > 2003/11/12 8:32:44 10 139.6368501 35.51527949 23.6 > > > 27406416 1 2177 3 > > > 2003/11/12 8:32:45 10 139.636606 35.51526727 27.6 > > > 27406436 1 2155 3 > > > > I see 9 labels (Date, Time, Veh ID, Longitude, Latitude, Speed, Odometer, > > Route No, distance flag), but 10 data fields (2003/11/12, 8:32:43, 10, > > 139.6368501, 35.51527949, 23.6, 27406416, 1, 2177, 3) ??? > > > > > The Points A,B and C are defined. In other words, the coordinates of > > > these points are known. > > > > > > I tried to attach a flag > > > > > > depending on the distance to the bus from the > > > origin of the route as follows: > > > > > > if the distance from origin is 0 to 50, 0 > > > if it is 50 to 2000, 1 > > > if it is 2000 to 2070, 2 > > > if it is 2070 to 5775 , 3 > > > if 5775 to 5830 flag is 4. > > > > > > Point A(origin) lies between 0-50 > > > Point B(destination) lies between 5775-5830 > > > Point C(turning to other route) lies between 2000-2070 > > > > > > If bus has gone from origin to destination we get a series of 0,1,2,3 > > > and 4. > > > > Which data fields group the records from which you calculate the > > sequence? Are these the three fields Date / Veh ID / Route No ? > > (If a bus drives more than once a day: how to separate these drives?) > > > > Can the sequence also be something like 0,1,2,2,3,3,4 etc? > > (It seems from the first two records: same date, same Veh ID, same Route > > No) > > > > Is it correct that the group of records from which you calculate the > > sequence is ordered in the file, but can be intermixed with (also > > ordered) lines of other record groups? > > > > Is there a maximum number of records per group? > > > > > If bus is coming other direction, sequence is 4,3,2,1 and 0. I want to > > > separate these two data into two files. If it has turned at point C we > > > get sequence of 0,1 and 2 only. This data I don't need. > > > > So, you have these three cases, correct? > > - sequence 0..4 (into first file) > > - sequence 4..0 (into second file) > > - sequence 0..2 (discard) > > (whereby a sequence value can be repeated) > > > > > My problem is how to split this based on the flag I have already > > > attached. > > > > [...] > > > > Could you also post about 30 lines of sample data, without line breaks > > within the records, and eventually indicate which lines form a sequence? > > > > Dani
[offlist:] Geetha Weerasooriya am Sonntag, 4. September 2006: > Thanks for your mail. I am so happy that you are exactly understanding my > problem and you have already understood it correctly even beyond what I > have explained. Hi Please answer to the list, try to answer inline (as I do), and don't send binary attachments :-) To read this mail, turn line breaks in your mail client off. > Sorry there are 10 data fields and I have given the 10 labels. Distance and > Flag are two labels. Distance means distance from the origin of the > route(Point A) to the bus location. Sample data, exported as CSV: "Date ","Time ","VID","Longitude","Latitude","Speed","Odometer reading","Route No","Distance","Flag" " 2003/11/12",08:58:12,10,139.63,35.51,11.8,27412848,1,1900,1 " 2003/11/12",08:58:13,10,139.63,35.51,11.8,27412848,1,1900,1 " 2003/11/12",08:58:14,10,139.63,35.51,16.6,27412856,1,1900,1 " 2003/11/12",08:58:15,10,139.63,35.51,20.6,27412864,1,1900,1 " 2003/11/12",08:58:16,10,139.63,35.51,25.4,27412876,1,1900,1 " 2003/11/12",08:58:17,10,139.63,35.51,25.4,27412876,1,1900,1 " 2003/11/12",08:58:18,10,139.63,35.51,26,27412884,1,1936,1 " 2003/11/12",08:58:19,10,139.63,35.51,28.4,27412900,1,1943,1 " 2003/11/12",08:58:20,10,139.63,35.51,31.8,27412908,1,1952,1 " 2003/11/12",08:58:21,10,139.63,35.51,35.2,27412924,1,1952,1 " 2003/11/12",08:58:22,10,139.63,35.51,35.2,27412924,1,1952,1 " 2003/11/12",08:58:23,10,139.63,35.51,34.8,27412932,1,1960,1 " 2003/11/12",08:58:24,10,139.63,35.51,35.2,27412944,1,1997,1 " 2003/11/12",08:58:25,10,139.64,35.51,35.2,27412960,1,2007,2 " 2003/11/12",08:58:26,10,139.64,35.51,35.2,27412960,1,2007,2 " 2003/11/12",08:58:27,10,139.64,35.51,35.4,27412972,1,2015,2 " 2003/11/12",08:58:28,10,139.64,35.51,35.8,27412984,1,2015,2 " 2003/11/12",08:58:29,10,139.64,35.51,35.8,27412998,1,2015,2 " 2003/11/12",08:58:30,10,139.64,35.52,35.6,27413008,1,2034,2 " 2003/11/12",08:58:31,10,139.64,35.52,35.6,27413008,1,2034,2 " 2003/11/12",08:58:32,10,139.64,35.52,34.6,27413022,1,2073,3 " 2003/11/12",08:58:33,10,139.64,35.52,34,27413032,1,2083,3 " 2003/11/12",08:58:34,10,139.64,35.52,33.8,27413044,1,2083,3 " 2003/11/12",08:58:35,10,139.64,35.52,33,27413052,1,2092,3 " 2003/11/12",08:58:36,10,139.64,35.52,33,27413052,1,2092,3 " 2003/11/12",08:58:37,10,139.64,35.52,27,27413060,1,2099,3 " 2003/11/12",08:58:38,10,139.64,35.52,22.2,27413064,1,2107,3 " 2003/11/12",08:58:39,10,139.64,35.52,16.8,27413068,1,2107,3 " 2003/11/12",08:58:40,10,139.64,35.52,12,27413072,1,2122,3 " 2003/11/12",08:58:41,10,139.64,35.52,12,27413072,1,2122,3 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,3 I see that there is an entry every minute for every bus. > When calculating the sequence, I have considered only the Distance data. As > explained in my previous mail, depending on the distance I have given > numbers 0,1,2,3 and 4 and which is in last column. > > In this case I am considering only one route and hence route no need not > be considered. As you have correctly understood, I have to consider the > Date and VID also in recognizing the sequence. > > Yes, it is true that the bus has moved along the route several times in a > day. Sometimes more than 20 times. Still I am thinking how to separate > these drives. I want to give them a name such as Trajectory1, Trajectory 2 > , so on. > > The sequence is exactly 0,0,0,0,……0, 1,1,1, …………1, 2,2,…………..2, > 3,3,………….3, 4,4,……………..4. like that. As I have observed one sequence has > in average 1500 records, because bus stops at bus halts and intersections > and then same recoding is there for a long time. Since the number of > records per group depends on the speed of the bus, bus stops ect, there is > no maximum number of records per group. > > Yes the group of records from which I calculate the sequence is ordered in > the file. clear so far. > Yes I have the 3 cases you have identified. But the case is when the bus > goes to the end of the route it change the direction and then we get > 0,1,2,3,4,3,2,1,0,1,2,…that way. So, we have to determine somehow when the bus changes direction. I think "Distance" can be considered for this purpose, although not in an exact way (there are lines in the data with identical "Distance" [and "Flag"]). [...] > Thank you very much for taking trouble in this connection. I won't be able to present "the solution", there are still decisions to be made by yourself. Following is a skeleton for *one* possible, hardly best, solution: 1. Export the xls files as csv, without label lines (you have two of them) (The script below uses the <DATA> filehandle with your sample data; I inserted a first record, and several records at the end, marked with 'xxx') 2. The following script uses data from *one* bus route (with possibly incomplete forth and back drives) 3. Adapt the following script to your needs: (It is in several ways not very performant and does not use modules. It's also terribly non-elegant. But it *seems* to do more or less what you want... NOTE: - It requires that, at the end/return points, more than one line with the same flag are present. - one line at end/return points is output/discarded twice - The "split" at the end/points is made after the first line with identical flags. NOT THOROUGHLY TESTED!!! ) Script output with __DATA__ data: print records to forward file (0..4) print records to backwards file (4..0) discard (0..2) discard (2..0) *sigh* #!/usr/bin/perl use strict; use warnings; my @fields=qw/ date time vid longitude latitude speed odometer_reading route_no distance flag /; my @records; # accumulate lines here until output or discarded ### main # while (<DATA>) { chomp; my %line; @[EMAIL PROTECTED](split /,/); # warn map {"<$_=>$line{$_}>\n"} keys %line; accumulate([EMAIL PROTECTED], \%line); } output ([EMAIL PROTECTED], calc([EMAIL PROTECTED])); # output the last record... ### calculate $direction, $first_flag and $last_flag from records # sub calc { my $records=shift; # what direction (0=unspecified, -1=backwards, 1=forward) # my ($first_flag, $last_flag)=($records[0]->{flag}, $records[-1]->{flag}); my $direction=($first_flag == $last_flag) ? 0 : ($first_flag < $last_flag) ? 1 : -1; return ($direction, $first_flag, $last_flag); } ### print or discard records # sub output { my ($records, $direction, $first_flag, $last_flag)[EMAIL PROTECTED]; my ($low, $high)=sort ($first_flag, $last_flag); my $discard=($low != 0 or $high != 4) ? 1 : 0; if ($discard) { warn "discard ($first_flag..$last_flag)\n"; } else { if ($direction > 0) { warn "print records to forward file ($first_flag..$last_flag)\n"; } else { # < 0 warn "print records to backwards file ($first_flag..$last_flag)\n"; } } my $last_record=$records->[-1]; @$records=(); return $last_record; } ### accumulates lines until output or discarded # sub accumulate { my ($records, $line)[EMAIL PROTECTED]; # warn "[EMAIL PROTECTED]"; if (@$records) { my ($direction, $first_flag, $last_flag)=calc($records); # warn "direction=$direction, first=$first_flag, last=$last_flag\n"; if ($direction) { if ($direction > 0 and $line->{flag} >= $last_flag) { # forward unfinished push @$records, $line; } elsif ($direction < 0 and $line->{flag} <= $last_flag) { # backwards unfinished push @$records, $line; } else { # direction change => output/discard my $last_record=output($records, $direction, $first_flag, $last_flag); push @$records, $last_record; # this line double output if direction change! push @$records, $line; } } else { push @$records, $line; } } else { push @$records, $line; } } __DATA__ " xxxx/11/12",08:58:12,10,139.63,35.51,11.8,27412848,1,1900,0 " 2003/11/12",08:58:12,10,139.63,35.51,11.8,27412848,1,1900,1 " 2003/11/12",08:58:13,10,139.63,35.51,11.8,27412848,1,1900,1 " 2003/11/12",08:58:14,10,139.63,35.51,16.6,27412856,1,1900,1 " 2003/11/12",08:58:15,10,139.63,35.51,20.6,27412864,1,1900,1 " 2003/11/12",08:58:16,10,139.63,35.51,25.4,27412876,1,1900,1 " 2003/11/12",08:58:17,10,139.63,35.51,25.4,27412876,1,1900,1 " 2003/11/12",08:58:18,10,139.63,35.51,26,27412884,1,1936,1 " 2003/11/12",08:58:19,10,139.63,35.51,28.4,27412900,1,1943,1 " 2003/11/12",08:58:20,10,139.63,35.51,31.8,27412908,1,1952,1 " 2003/11/12",08:58:21,10,139.63,35.51,35.2,27412924,1,1952,1 " 2003/11/12",08:58:22,10,139.63,35.51,35.2,27412924,1,1952,1 " 2003/11/12",08:58:23,10,139.63,35.51,34.8,27412932,1,1960,1 " 2003/11/12",08:58:24,10,139.63,35.51,35.2,27412944,1,1997,1 " 2003/11/12",08:58:25,10,139.64,35.51,35.2,27412960,1,2007,2 " 2003/11/12",08:58:26,10,139.64,35.51,35.2,27412960,1,2007,2 " 2003/11/12",08:58:27,10,139.64,35.51,35.4,27412972,1,2015,2 " 2003/11/12",08:58:28,10,139.64,35.51,35.8,27412984,1,2015,2 " 2003/11/12",08:58:29,10,139.64,35.51,35.8,27412998,1,2015,2 " 2003/11/12",08:58:30,10,139.64,35.52,35.6,27413008,1,2034,2 " 2003/11/12",08:58:31,10,139.64,35.52,35.6,27413008,1,2034,2 " 2003/11/12",08:58:32,10,139.64,35.52,34.6,27413022,1,2073,3 " 2003/11/12",08:58:33,10,139.64,35.52,34,27413032,1,2083,3 " 2003/11/12",08:58:34,10,139.64,35.52,33.8,27413044,1,2083,3 " 2003/11/12",08:58:35,10,139.64,35.52,33,27413052,1,2092,3 " 2003/11/12",08:58:36,10,139.64,35.52,33,27413052,1,2092,3 " 2003/11/12",08:58:37,10,139.64,35.52,27,27413060,1,2099,3 " 2003/11/12",08:58:38,10,139.64,35.52,22.2,27413064,1,2107,3 " 2003/11/12",08:58:39,10,139.64,35.52,16.8,27413068,1,2107,3 " 2003/11/12",08:58:40,10,139.64,35.52,12,27413072,1,2122,3 " 2003/11/12",08:58:41,10,139.64,35.52,12,27413072,1,2122,3 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,3 " xxxx/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,4 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,4 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,3 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,0 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,0 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1 " 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,0 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>