[sorry to all for not snipping the mail]

> D. Bolliger am Montag, 4. September 2006 03:57:
> > Geetha Weerasooriya am Sonntag, 3. September 2006 16:22:
> > > Dear Mr.Dani,
> > >
> > > Thank you very much for the reply. I understand that but question is
> > > not clear. I will explain my problem little more.
> > >
> > > Below is the sample of my data set. Actual data file is very much
> > > larger.( about 3 GB)
> >
> > [please don't top post]
> >
> > Hm, I need still more info (and I hope others as well)...
> >
> > > Date          Time  Veh ID        Longitude       Latitude Speed Odometer
> > > Route No
> > > distance flag
> > >  2003/11/12       8:32:43 10      139.6368501     35.51527949     23.6
> > > 27406416 1        2177    3
> > >  2003/11/12       8:32:44 10      139.6368501     35.51527949     23.6
> > > 27406416 1        2177    3
> > >  2003/11/12       8:32:45 10      139.636606      35.51526727     27.6
> > > 27406436 1        2155    3
> >
> > I see 9 labels (Date, Time, Veh ID, Longitude, Latitude, Speed, Odometer,
> > Route No, distance flag), but 10 data fields (2003/11/12, 8:32:43, 10,
> > 139.6368501, 35.51527949, 23.6, 27406416, 1, 2177, 3) ???
> >
> > > The Points A,B and C are defined.  In other words, the coordinates of
> > > these points are known.
> > >
> > > I tried to attach a flag
> > >
> > > depending on the distance to the bus  from the
> > > origin of the route as follows:
> > >
> > > if the distance from origin  is 0 to 50, 0
> > > if it is 50 to 2000, 1
> > > if it is 2000 to 2070, 2
> > > if it is 2070 to 5775 , 3
> > > if  5775 to 5830 flag is 4.
> > >
> > > Point A(origin) lies between 0-50
> > > Point B(destination) lies between 5775-5830
> > > Point C(turning to other route) lies between 2000-2070
> > >
> > > If bus has gone from origin to destination we get a series of 0,1,2,3
> > > and 4.
> >
> > Which data fields group the records from which you calculate the
> > sequence? Are these the three fields Date / Veh ID / Route No ?
> > (If a bus drives more than once a day: how to separate these drives?)
> >
> > Can the sequence also be something like 0,1,2,2,3,3,4 etc?
> > (It seems from the first two records: same date, same Veh ID, same Route
> > No)
> >
> > Is it correct that the group of records from which you calculate the
> > sequence is ordered in the file, but can be intermixed with (also
> > ordered) lines of other record groups?
> >
> > Is there a maximum number of records per group?
> >
> > > If bus is coming other direction, sequence is 4,3,2,1 and 0. I want to
> > > separate these two data into two files. If it has turned at point C we
> > > get sequence of 0,1 and 2 only. This data I don't need.
> >
> > So, you have these three cases, correct?
> > - sequence 0..4 (into first file)
> > - sequence 4..0 (into second file)
> > - sequence 0..2 (discard)
> > (whereby a sequence value can be repeated)
> >
> > > My problem is how to split this based on the flag I have already
> > > attached.
> >
> > [...]
> >
> > Could you also post about 30 lines of sample data, without line breaks
> > within the records, and eventually indicate which lines form a sequence?
> >
> > Dani

[offlist:] Geetha Weerasooriya am Sonntag, 4. September 2006:
> Thanks for your mail. I am so happy that you are exactly understanding my
> problem and you have already understood it correctly even beyond what I
> have explained.

Hi

Please answer to the list, try to answer inline (as I do),
and don't send binary attachments :-)

To read this mail, turn line breaks in your mail client off.

> Sorry there are 10 data fields and I have given the 10 labels. Distance and
> Flag are two labels. Distance means distance from the origin of the
> route(Point A) to the bus location.

Sample data, exported as CSV:

"Date ","Time ","VID","Longitude","Latitude","Speed","Odometer reading","Route 
No","Distance","Flag"
" 2003/11/12",08:58:12,10,139.63,35.51,11.8,27412848,1,1900,1
" 2003/11/12",08:58:13,10,139.63,35.51,11.8,27412848,1,1900,1
" 2003/11/12",08:58:14,10,139.63,35.51,16.6,27412856,1,1900,1
" 2003/11/12",08:58:15,10,139.63,35.51,20.6,27412864,1,1900,1
" 2003/11/12",08:58:16,10,139.63,35.51,25.4,27412876,1,1900,1
" 2003/11/12",08:58:17,10,139.63,35.51,25.4,27412876,1,1900,1
" 2003/11/12",08:58:18,10,139.63,35.51,26,27412884,1,1936,1
" 2003/11/12",08:58:19,10,139.63,35.51,28.4,27412900,1,1943,1
" 2003/11/12",08:58:20,10,139.63,35.51,31.8,27412908,1,1952,1
" 2003/11/12",08:58:21,10,139.63,35.51,35.2,27412924,1,1952,1
" 2003/11/12",08:58:22,10,139.63,35.51,35.2,27412924,1,1952,1
" 2003/11/12",08:58:23,10,139.63,35.51,34.8,27412932,1,1960,1
" 2003/11/12",08:58:24,10,139.63,35.51,35.2,27412944,1,1997,1
" 2003/11/12",08:58:25,10,139.64,35.51,35.2,27412960,1,2007,2
" 2003/11/12",08:58:26,10,139.64,35.51,35.2,27412960,1,2007,2
" 2003/11/12",08:58:27,10,139.64,35.51,35.4,27412972,1,2015,2
" 2003/11/12",08:58:28,10,139.64,35.51,35.8,27412984,1,2015,2
" 2003/11/12",08:58:29,10,139.64,35.51,35.8,27412998,1,2015,2
" 2003/11/12",08:58:30,10,139.64,35.52,35.6,27413008,1,2034,2
" 2003/11/12",08:58:31,10,139.64,35.52,35.6,27413008,1,2034,2
" 2003/11/12",08:58:32,10,139.64,35.52,34.6,27413022,1,2073,3
" 2003/11/12",08:58:33,10,139.64,35.52,34,27413032,1,2083,3
" 2003/11/12",08:58:34,10,139.64,35.52,33.8,27413044,1,2083,3
" 2003/11/12",08:58:35,10,139.64,35.52,33,27413052,1,2092,3
" 2003/11/12",08:58:36,10,139.64,35.52,33,27413052,1,2092,3
" 2003/11/12",08:58:37,10,139.64,35.52,27,27413060,1,2099,3
" 2003/11/12",08:58:38,10,139.64,35.52,22.2,27413064,1,2107,3
" 2003/11/12",08:58:39,10,139.64,35.52,16.8,27413068,1,2107,3
" 2003/11/12",08:58:40,10,139.64,35.52,12,27413072,1,2122,3
" 2003/11/12",08:58:41,10,139.64,35.52,12,27413072,1,2122,3
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,3

I see that there is an entry every minute for every bus.

> When calculating the sequence, I have considered only the Distance data. As
> explained in my previous mail, depending on the distance I have given
> numbers 0,1,2,3 and 4 and which is in last column.
>
>  In this case I am considering only one route and hence route no need not
> be considered. As you have correctly understood, I have to consider the
> Date and VID also in recognizing the sequence.
>
> Yes, it is true that the bus has moved along the route several times in a
> day. Sometimes more than 20 times.  Still I am thinking how to separate
> these drives. I want to give them a name such as Trajectory1, Trajectory 2
> , so on.
>
> The sequence is exactly  0,0,0,0,……0, 1,1,1,  …………1, 2,2,…………..2,
> 3,3,………….3, 4,4,……………..4. like that.  As I have observed one sequence has
> in average 1500 records, because bus stops at bus halts and intersections
> and then same recoding is there for a long time. Since the number of
> records per group depends on the speed of the bus, bus stops ect, there is
> no maximum number of records per group.
>
> Yes the group of records from which I calculate the sequence is ordered in
> the file.

clear so far.

> Yes I have the 3 cases you have identified. But the case is when the bus
> goes to the end of the route it change the direction and then we get
> 0,1,2,3,4,3,2,1,0,1,2,…that way.

So, we have to determine somehow when the bus changes direction.
I think "Distance" can be considered for this purpose, although not
in an exact way (there are lines in the data with identical "Distance"
[and "Flag"]).

[...]
> Thank you very much for taking trouble in this connection.

I won't be able to present "the solution", there are still decisions
to be made by yourself.

Following is a skeleton for *one* possible, hardly best, solution:

1. Export the xls files as csv, without label lines (you have two of them)
   (The script below uses the <DATA> filehandle with your sample data;
   I inserted a first record, and several records at the end, marked
   with 'xxx')

2. The following script uses data from *one* bus route
   (with possibly incomplete forth and back drives)

3. Adapt the following script to your needs:
   (It is in several ways not very performant and does not use modules.
   It's also terribly non-elegant. But it *seems* to do 
   more or less what you want... 
   NOTE: 
   - It requires that, at the end/return points, more than one line 
     with the same flag are present.
   - one line at end/return points is output/discarded twice
   - The "split" at the end/points is made after the first line
     with identical flags.
   NOT THOROUGHLY TESTED!!!
   )

Script output with __DATA__ data:

print records to forward file (0..4)
print records to backwards file (4..0)
discard (0..2)
discard (2..0)

*sigh*


#!/usr/bin/perl

use strict;
use warnings;

my @fields=qw/ date time vid longitude latitude speed odometer_reading route_no 
distance flag /;

my @records; # accumulate lines here until output or discarded

### main
#
while (<DATA>) {
  chomp;
  my %line;

  @[EMAIL PROTECTED](split /,/);
#  warn map {"<$_=>$line{$_}>\n"} keys %line;

  accumulate([EMAIL PROTECTED], \%line);
}
output ([EMAIL PROTECTED], calc([EMAIL PROTECTED])); # output the last record...


### calculate $direction, $first_flag and $last_flag from records
#
sub calc {
  my $records=shift;

  # what direction (0=unspecified, -1=backwards, 1=forward)
  #
  my ($first_flag, $last_flag)=($records[0]->{flag}, $records[-1]->{flag});
  my $direction=($first_flag == $last_flag) ? 0 : ($first_flag < $last_flag) ? 
1 : -1;

  return ($direction, $first_flag, $last_flag);
}


### print or discard records
#
sub output {
  my ($records, $direction, $first_flag, $last_flag)[EMAIL PROTECTED];
  my ($low, $high)=sort ($first_flag, $last_flag);

  my $discard=($low != 0 or $high != 4) ? 1 : 0;

  if ($discard) {
    warn "discard ($first_flag..$last_flag)\n";
  }
  else {
    if ($direction > 0) {
      warn "print records to forward file ($first_flag..$last_flag)\n";
    }
    else { # < 0
      warn "print records to backwards file ($first_flag..$last_flag)\n";
    }
  }

  my $last_record=$records->[-1];
  @$records=();
  return $last_record;
}


### accumulates lines until output or discarded
#
sub accumulate {
  my ($records, $line)[EMAIL PROTECTED];

#  warn "[EMAIL PROTECTED]";

  if (@$records) {
    my ($direction, $first_flag, $last_flag)=calc($records);
#    warn "direction=$direction, first=$first_flag, last=$last_flag\n";

    if ($direction) {
      if ($direction > 0 and $line->{flag} >= $last_flag) { # forward unfinished
        push @$records, $line;
      }
      elsif ($direction < 0 and $line->{flag} <= $last_flag) {  # backwards 
unfinished
        push @$records, $line;
      }
      else { # direction change => output/discard

        my $last_record=output($records, $direction, $first_flag, $last_flag);

        push @$records, $last_record; # this line double output if direction 
change!
        push @$records, $line;
      }
    }
    else {
      push @$records, $line;
    }
  }
  else {
    push @$records, $line;
  }
}


__DATA__
" xxxx/11/12",08:58:12,10,139.63,35.51,11.8,27412848,1,1900,0
" 2003/11/12",08:58:12,10,139.63,35.51,11.8,27412848,1,1900,1
" 2003/11/12",08:58:13,10,139.63,35.51,11.8,27412848,1,1900,1
" 2003/11/12",08:58:14,10,139.63,35.51,16.6,27412856,1,1900,1
" 2003/11/12",08:58:15,10,139.63,35.51,20.6,27412864,1,1900,1
" 2003/11/12",08:58:16,10,139.63,35.51,25.4,27412876,1,1900,1
" 2003/11/12",08:58:17,10,139.63,35.51,25.4,27412876,1,1900,1
" 2003/11/12",08:58:18,10,139.63,35.51,26,27412884,1,1936,1
" 2003/11/12",08:58:19,10,139.63,35.51,28.4,27412900,1,1943,1
" 2003/11/12",08:58:20,10,139.63,35.51,31.8,27412908,1,1952,1
" 2003/11/12",08:58:21,10,139.63,35.51,35.2,27412924,1,1952,1
" 2003/11/12",08:58:22,10,139.63,35.51,35.2,27412924,1,1952,1
" 2003/11/12",08:58:23,10,139.63,35.51,34.8,27412932,1,1960,1
" 2003/11/12",08:58:24,10,139.63,35.51,35.2,27412944,1,1997,1
" 2003/11/12",08:58:25,10,139.64,35.51,35.2,27412960,1,2007,2
" 2003/11/12",08:58:26,10,139.64,35.51,35.2,27412960,1,2007,2
" 2003/11/12",08:58:27,10,139.64,35.51,35.4,27412972,1,2015,2
" 2003/11/12",08:58:28,10,139.64,35.51,35.8,27412984,1,2015,2
" 2003/11/12",08:58:29,10,139.64,35.51,35.8,27412998,1,2015,2
" 2003/11/12",08:58:30,10,139.64,35.52,35.6,27413008,1,2034,2
" 2003/11/12",08:58:31,10,139.64,35.52,35.6,27413008,1,2034,2
" 2003/11/12",08:58:32,10,139.64,35.52,34.6,27413022,1,2073,3
" 2003/11/12",08:58:33,10,139.64,35.52,34,27413032,1,2083,3
" 2003/11/12",08:58:34,10,139.64,35.52,33.8,27413044,1,2083,3
" 2003/11/12",08:58:35,10,139.64,35.52,33,27413052,1,2092,3
" 2003/11/12",08:58:36,10,139.64,35.52,33,27413052,1,2092,3
" 2003/11/12",08:58:37,10,139.64,35.52,27,27413060,1,2099,3
" 2003/11/12",08:58:38,10,139.64,35.52,22.2,27413064,1,2107,3
" 2003/11/12",08:58:39,10,139.64,35.52,16.8,27413068,1,2107,3
" 2003/11/12",08:58:40,10,139.64,35.52,12,27413072,1,2122,3
" 2003/11/12",08:58:41,10,139.64,35.52,12,27413072,1,2122,3
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,3
" xxxx/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,4
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,4
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,3
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,0
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,0
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,2
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,1
" 2003/11/12",08:58:42,10,139.64,35.52,6.6,27413172,1,2123,0


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to