Hi Devdatta, Thanks for asking, happy to provide the documentation here:
*file* : name of the GTFS-RT feed file that was processed (there's one every 30 secs, so named by epoch timestamp of downloading) *num_vehicles* : number of vehicle locations (entities) in the feed *feed_timestamp* : timestamp of the feed that's there in the header. If you got the same value as the last feed, it's duplicate. (Whoops I just found out there were some 13k dupes in my data!) *feed_time* : human, IST form of timestamp *incrementality* : that's another field in the feed's header *bad_count* : number of entities that had missing or flawed lat-longs like 0,0 *earliest* : timestamp of the earliest (farthest back in time) vehicle entity in the file (apart from feed timestamp, each vehicle location 'entity' must carry a timestamp of its own, because we can't assume all 700-odd vehicles sent in their lat-longs at exactly the same time. We're doing asynchronous business here.) *diff1* : seconds gap between earliest and feed_timestamp *latest* : time of latest entity *diff2* : seconds gap between latest and feed_timestamp What diff1, diff2 tells me : How "dated" is the information in the feed; and how "recent" is the information in it. Supposing diff2 was too large consistently, then I wouldn't bother to download a fresh feed every 30 secs which is the standard minimum refreshing time as per GTFS-RT specs. For reference, Sample json output from a feed: {'header': {'gtfsRealtimeVersion': '2.0', 'incrementality': 'FULL_DATASET', 'timestamp': '1550596818'}, 'entity': [{'id': 'vehicle', 'vehicle': {'trip': {'tripId': '6255', 'routeId': '225'}, 'position': {'latitude': 28.610946655273438, 'longitude': 76.980224609375, 'speed': 0.0}, 'timestamp': '1550596773', 'vehicle': {'id': 'DL1PD0716', 'label': 'DL1PD0716'}}}, Repeat-link to my code example: http://kyso.io/answerquest/delhi-gtfs-rt-feed-file-analysis -- Cheers, Nikhil VJ, Pune, India http://nikhilvj.co.in On Fri, Mar 22, 2019 at 9:53 AM Devdatta Tengshe <devda...@tengshe.in> wrote: > Hi Nikhil, > > Thanks for sharing this data. > I had a question about the 'delhi_vehicle_reports.csv' file. > > If there any documentation about the fields in this file? > > I see the following headers: > >> >> file,num_vehicles,feed_timestamp,feed_time,incrementality,bad_count,earliest,diff1,latest,diff2 >> > & I'm wondering what they are > > Regards, > Devdatta > > > On Thu, Mar 21, 2019 at 1:24 PM Nikhil VJ <nikhil...@gmail.com> wrote: > >> Hi folks, >> >> I have been archiving Delhi's bus realtime gtfs feeds on my server since >> a month now and collating the data into a flat CSV. Sharing that data at >> this link for download and analysis: >> https://server.nikhilvj.co.in/place1/ >> >> Hoping some folks can make some analysis, visualizations or so of it - I >> don't have time to delve too much into that right now. It's been a great >> learning experience arranging the scripts and structures on my digitalocean >> server to make this long-term continuous archival process possible. >> >> Disclaimer: Default reply to every sage advice that starts with "Why >> don't you.." is : "Sounds good, please do it and get back with the >> results." I'm satisfied at my end and am sharing the data wealth here for >> others to take forward, so don't bug me, just take it and go! ;) >> >> >> Some more notes: >> 1. Get 7zip portable / p7zipfull to uncompress it. The uncompressed one >> is around 6 gigs, compressed its about half a gig. >> 2. There may be many repetitions in the data though, since the feeds were >> coming in every 30 secs, plus from my last email the analysis showed there >> were repetitions in one feed itself. So it's a data cleaning challenge for >> you here to remove repetitions. (Do it - don't expect things to be already >> done for you unless you're paying a fortune for it!) >> 3. If there is too much traffic on my server then I'll lock it all down >> with username-password restrictions. So don't do silly things like telling >> a whole class of students to download it from online only. Use a pen drive >> or your LAN. >> 4. There is an accompanying reports csv that gives file-level summaries. >> 5. Timestamps are in epoch format in UTC timezone (as per GTFS-RT specs). >> Lookup "epoch converter". In the reports file i've added 5.5hrs to get >> human times in IST. >> 6. The data inside is covering all the dates from 19 Feb onwards. Moving >> forward I might make the scripts store things month-wise or week-wise. Here >> it was important to start asap. >> 7. Every early morning my scripts will place a fresh version of the data >> there and remove the previous day's one. So don't be downloading stuff from >> there at 5am. >> 8. Tip : Python? Wanna map? Check out folium. >> 9. Tip : Folium? Wanna share the ipynb notebook? Check out kyso.io >> >> PS: Thanks JohnsonC for the kind words. But that is because I use >> datameet from google groups instead of from my mailbox, unless it's an >> immediate followup. So it's like stackexchange for me, and it saves me time >> and effort. >> >> Cheers >> Nikhil VJ >> Pune, India >> >> >> On Thursday, March 14, 2019 at 11:50:24 AM UTC+5:30, JohnsonC wrote: >>> >>> >>> This is helpful. >>> Thanks for updating on this Nikhil. >>> This thread was from November and you bothered to search and update it. >>> >>> Thanks, >>> >>> >>> On Wed, 13 Mar 2019 at 20:38, Nikhil VJ <nikh...@gmail.com> wrote: >>> >>>> Hi Folks, >>>> >>>> Sometime last month my API key for the realtime feed of Delhi bus data >>>> started working. (link to register for y >>>> <https://otd.delhi.gov.in/data/realtime/>ours). >>>> >>>> Here's an "unboxing" of one gtfs-realtime vehicleposition feed file >>>> from there: >>>> http://kyso.io/answerquest/delhi-gtfs-rt-feed-file-analysis >>>> >>>> >>>> Note: I'm guessing this is not DTC but other bus services operating in >>>> Delhi. >>>> >>>> >>>> - Nikhil VJ >>>> Pune, India >>>> https://nikhilvj.co.in >>>> >>>> >>>> On Thursday, November 29, 2018 at 5:17:39 PM UTC+5:30, Nikhil VJ wrote: >>>>> >>>>> Hi Arun, >>>>> >>>>> This data doesn't include any shapes.txt file, probably that script >>>>> requires it. shapes.txt is not mandatory in GTFS. The routes are defined >>>>> as >>>>> sequence of stops in stop_times.txt (multiplied by the number of trips in >>>>> a >>>>> day that is). >>>>> >>>>> There's room here for improvement. Here's a full gtfs validator output >>>>> for the delhi data : >>>>> http://nikhilvj.co.in/files/delhi_gtfs/delhi-gtfs-2.html >>>>> >>>>> One peculiarity : The routes have been split up into separate onward >>>>> and return journey routes. >>>>> >>>>> If anybody knows someone on the technical team of this, kindly connect >>>>> me with them. The project leads are probably too busy with handling >>>>> realtime data access requests and won't take too kindly to feedback about >>>>> what improvements can be done on static side, but I might be able to put >>>>> something across to the technical folks. >>>>> >>>>> You can zip up and import the static GTFS files on static GTFS Manager >>>>> <https://github.com/WRI-Cities/static-GTFS-manager> tool. If someone >>>>> wants to draw the shapefiles of the routes and add them in, the "Default >>>>> Sequence" page will help you do that. >>>>> >>>>> -- >>>>> Cheers, >>>>> Nikhil VJ >>>>> +91-966-583-1250 >>>>> Pune, India >>>>> http://nikhilvj.co.in >>>>> >>>>> >>>>> On Tue, Nov 27, 2018 at 1:50 AM Arun Ganesh wrote: >>>>> >>>>>> Was anyone able to convert the GTFS feed into a geojson? >>>>>> >>>>>> Tried https://github.com/BlinkTagInc/gtfs-to-geojson but for some >>>>>> reason does not produce any route lines. >>>>>> >>>>>> -- >>>>>> Datameet is a community of Data Science enthusiasts in India. Know >>>>>> more about us by visiting http://datameet.org >>>>>> --- >>>>>> >>>>>> -- >>>> Datameet is a community of Data Science enthusiasts in India. Know more >>>> about us by visiting http://datameet.org >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "datameet" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to datameet+u...@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> Warm Regards, >>> Johnson Chetty >>> >>> >>> >>> >>> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to datameet+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to datameet+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.